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This paper derives the rate of convergence and asymptotic distribution for a class of 
Kolmogorov-Smirnov style test statistics for conditional moment inequality models for 
parameters on the boundary of the identified set under general conditions. In contrast 
to other moment inequality settings, the rate of convergence is faster than root-n, 
and the asymptotic distribution depends entirely on nonbinding moments. The results 
require the development of new techniques that draw a connection between moment 
selection, irregular identification, bandwidth selection and nonstandard M-estimation. 
Using these results, I propose tests that are more powerful than existing approaches 
for choosing critical values for this test statistic. I quantify the power improvement 
by showing that the new tests can detect alternatives that converge to points on the 
identified set at a faster rate than those detected by existing approaches. A monte carlo 
study confirms that the tests and the asymptotic approximations they use perform well 
in finite samples. In an application to a regression of prescription drug expenditures on 
income with interval data from the Health and Retirement Study, confidence regions 
based on the new tests are substantially tighter than those based on existing methods. 
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1 Introduction 



Theoretical restrictions used for estimation of economic models often take the form of mo- 
ment inequalities. Examples include models of consumer demand and strategic interac- 
tions between firms, bounds on treatment effects using instrumental variabl es restri c tions 
and various forms of censored and missing data (see , among many others, 



Manski and Tamer. 



2010 



2002 



Pakes. Porter. Ho. and Ishii. 



2006 



Manski . 



Ciliberto and Tamer 



1990 



20091 



Chettvl . 



and papers cited therein). For these models, the restriction often takes the form 
of moment inequalities conditional on some observed variable. That is, given a sample 
(Xi, Wi), . . . {Xn, Wn), we are interested in testing a null hypothesis of the form E{m(Wi, 6) \Xi 
with probability one, where the inequality is taken elementwise if m{Wi, 9) is a vector. 
Here, m{Wi, 9) is a known function of an observed random variable IVj, which may in- 
clude Xj, and a parameter 9 G W^" , and the moment inequality defines the identified set 
Bq = {9\E{m{Wi, 9)\Xi) > a.s.} of parameter values that cannot be ruled out by the data 
and the restrictions of the model. 

In this paper, I consider inference in models defined by conditional moment inequal- 
ities. I focus on test statistics that exploit the equivalence between the null hypothesis 
E{m{Wi,9)\Xi) > almost surely and Em{Wi,9)I{s < < s + 1) > for all {s,t). Thus, 
we can use inig^t ^ Y17=i "^(^j^ ^)^{^ < Xi < s + t), or the infimum of some weighted version 
of the unconditional moments indexed by {s,t). Following the terminology commonly used 
in the literature, I refer to these as Kolmogorov-Smirnov (KS) style test statistics. The main 
contribution of this paper is to derive the rate of convergence and asymptotic distribution 
of this test statistic for parameters on the boundary of the identified set under a general set 
of conditions. The asymptotic distributions derived in this paper and the methods used to 
derive them fall into a different category than other asymptotic distributions derived in the 
conditional moment inequalities and goodness-of-fit testing literatures. Rather, the asymp- 
totic distributions and rates of convergence derived here resemble more clo sely those of maxi- 
mized objective functions for nonstandard M-estimators (see, for example. 



Kim and Pollard. 



19901 ). but require new methods to derive. The results draw a connection between moment 



selection, bandwidth selection, irregular identification and nonstandard M-estimation. 



> 



W hile asympt otic distribution results are available for this statistic in some cases (j Andrews and Shi . 



2009; 



Kim. 



20081 ). the existing results give only a conservative upper bound of ^/n on the 



rate of convergence of this test statistic in a large class of important cases. For example, in 
the interval regression model, the asymptotic distribution of this test statistic for parameters 
on the boundary of the identified set and the proper scaling needed to achieve it have so 
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far been unknown in the generic case (see Section [2] for the definition of this model). In 
these cases, results available in the literature do not give an asymptotic distribution result, 
but state only that the test statistic converges in probability to zero when scaled up by 
a/ti. This paper derives the scaling that leads to a nondegenerate asymptotic distribution 
and characterizes this distribution. Existing results can be used for conservative inference in 
these cases (along with tuning parameters to prevent the critical value from going to zero), 
but lose power relative to procedures that use the results derived in this paper to choose 
critical values based on the asymptotic distribution of the test statistic on the boundary of 
the identified set. 

To quantify this power improvement, I show that using the asymptotic distributions 
derived in this paper gives power against sequences of parameter values that approach points 
on the boundary of the identified set at a faster rate than those detected using root-n 
convergence to a degenerate distribution. Since local power results have not been available 
for the conservative approach based on root-n approximations in this setting, making this 
comparison involves deriving new local power results for the existing tests in addition to the 
new tests. The increase in power is substantial. In the leading case considered in Section 
[31 I find that the methods developed in this paper give power against local alternatives 
that approach the identified set at a n~'^/^'^^^^^ rate (where dx is the dimension of the 
conditioning variable), while using conservative ^Jn approximations only gives power against 
^-i/{dx+2) alternatives. The power improvements are not completely free, however, as the 
new te sts require s mooth ness conditions not needed for existing approaches. In another 



paper ( Armstrong . 1201 ll ). I propose a modification of this test statistic that achieves a 



similar power improvement (up to a logn term) without sacrificing the robustness of the 
conservative approach. See Section [10] for more on these tradeoffs. 

To examine how well these asymptotic approximations describe sample sizes of practical 
importance, I perform a monte carlo study. Confidence regions based on the tests proposed 
in this paper have close to the nominal coverage in the monte carlos, and shrink to the 
identified set at a faster rate than those based on existing tests. In addition, I provide an 
empirical illustration examining the relationship between out of pocket prescription spending 
and income in a data set in which out of pocket prescription spending is sometimes missing 
or reported as an interval. Confidence regions for this application constructed using the 
methods in this paper are substantially tighter than those that use existing methods (these 
confidence regions are reported in Figures [H] and [9] and Table [5l see Section [9] for the details 
of the empirical illustration) . 
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While the asymptotic distribution results in this paper are technical in nature, the key 
insights can be described at an intuitive level. I provide a nontechnical exposition of these 
ideas in Section O Together with the statements of the asymptotic distribution results in 
Section [3] and the local power results in Section [71 this provides a general picture of the results 
of the paper. The rest of this section discusses the relation of these results to the rest of 
the literature, and introduces notation and definitions. Section |5] generalizes the asymptotic 
distribution results of Section [31 and Sections [Hand [HI deal with estimation of the asymptotic 
distribution for feasible inference. Section [8] presents monte carlo results. Section [9] presents 
the empirical illustration. In Section [TOl I discuss some implications of these results beyond 
the immediate application to constructing asymptotically exact tests. Section [TT] concludes. 
Proofs are in the appendix. 



1.1 Related Literature 



The results in this paper relate to recent work o n test ir ig conditional moment i n equa. 



ties, including papers by Andrews and Shi ( 2009 ). Kim ( 2008). iKhan and Tamer 



(2009) 



Chernqzhukoy. Lee, and Rosen 



mm . [Lee. Song, and Whand ([201 If ). [Ponomarevd fcOlOl ) 



Menzell (120081 1 and [Armstrong! (l201ll ). The results on the local power of asymptotically exact 



and conservative KS statistic based procedures derived in this paper are useful for compar- 
ing confidence regions based o n KS statist i cs to o ther methods of inference on the identified 
set proposed in these papers. Armstrong! ( 2011 ) derives local power results for some com- 
mon alternatives to the KS statistics based on integrated moments considered in this paper 
(the confidence regions considered in that paper satisfy the stronger criterion of containing 
the entire identified set, rather than individual points, with a prespecified probability). I 
compare the local power calculations in this paper with those results in Section [TOl 

Out of these existing approaches to inference on conditio nal moment inequalities , the 
pape r s tha t are most closely related to this one are those by [Andrews and Shi[ (|2009[ ) and 
Kind (120081 ). both of which consider statistics based on integrating the conditional inequal- 
ity. As discussed above, the main contributions of the present paper relative to these papers 
are (1) deriving the rate of convergence and nondegenerate asymptotic distribution of this 
statistic for parameters on the boundary of the identified set in the common case where 
the results in these papers reduce to a statement that the statistic converges to zero at a 
root-ra scaling and (2) deriving local power resu lts that show how much power is gained by 
using critical values based on these new results. [Armstrong (120 111 ) uses a statistic similar to 
the one considered here, but proposes an increasing sequence of weightings ruled out by the 
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assumptions of the rest of the hterature (including the present paper). This leads to almost 
the same power improvement as the met hods in this paper even when conservative critical 
values are used. iKhan and Tamer! (120091 ) propose a statistic similar to one considered here 



for a model defined by conditional moment inequalities, but consider point estimates and 
confi dence intervals based on the se estimates under conditions that lead to point identifica- 
tion. Icalichon and Henry! ( 2009 ) propose a similar statistic for a class of partially identified 
models under a different setup. Statistics based on integrating c onditiona l mom ents have 
been used widely in other contexts as well, and go back at least to iBierend (119821 ). 

The literature on models defined by finitely many uncondition al moment inequalities is 
more d eveloped, but still recent . Papers in this literature inc lude lA.Hrew. Rerrv .,1 



(2004) 



(2010) 



Andrews and Jia 



(|2008l). [Andrews and Guggenbereerl ( 



Chernozhukoy. Hong, and Tamed (120071) 



20091 



Romano and Shaikh ( 



2 



Andrews and Soares 



201 



Bugnj _( 2010rBeresteanu and MolinarI(l2008 ). lMoon and Schorfheidel ( 2009 ) 



Romano and Shaikh 



( 2008h 



(120041 ) and 



StOY! 



J2OO9I ) 



Imbens and Manski 



While most of this literature does not apply directly to the prob- 
lems considered in this paper when the conditioning variable is continuous, ideas from these 
papers have been used in the literature on conditional moment inequality models and other 
problems involving inference on sets. Indeed, some of these results are stated in a broad 
enough way to apply to the general problem of inference on partially identified models. 



1.2 Notation and Definitions 

Throughout this paper, I use the terms asymptotically exact and asymptotically conserva- 
tive to refer to the behavior of tests for a fixed parameter value under a fixed probability 
distribution. I refer to a test as asymptotically exact for testing a parameter 6 under a 
data generating process P such that the null hypothesis holds if the probability of rejecting 
6 converges to the nominal level as the number of observations increases to infinity under 
P. I refer to a test as asymptotically conservative for testing a parameter 6 under a data 
generating process P if the probability of falsely rejecting 6 is asymptotically strictly less 
than the nominal level under P. While this contrasts with a definition where a test is con- 
servative only if the size of the test is less than the nominal size taken as the supremum of 
the probability of rejection over a composite null of all possible values of 6 and P such that 
6 is in the identified set under P, it facilitates discussion of results like the ones in this paper 
(and other papers that deal with issues related to moment selection) that characterize the 
behavior of tests for different values of 6 in the identified set. 

I use the following notation in the rest of the paper. For observations (Xi, Wi), . . . , (X„, Wn) 
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and a measurable function h on the sample space, Enh{Xi, Wi) = ^ J2i=i h{Xi, Wi) denotes 
the sample mean. I use double subscripts to denote elements of vector observations so 
that Xij denotes the jth component of the ith observation Xi. Inequalities on Euclidean 
space refer to the partial ordering of elementwise inequality. For a vector valued function 
/i : — 7- M"^, the infimum of h over a set T is defined to be the vector consisting of the 
infimum of each element: infjgT h(t) = (inftg-r hi(t), . . . , inftg-r hm{t)). I use a A 6 to denote 
the elementwise minimum and a V 6 to denote the elementwise maximum of a and b. The 
notation [x] denotes the least integer greater than or equal to x. 

2 Overview of Results 

The asymptotic distributions derived in this paper arise when the conditional moment in- 
equality binds only on a probability zero set. In contrast to inference with finitely many 
unconditional moment inequalities, in which at least one moment inequality will bind on 
the boundary of the identified set and limiting distributions of test statistics are degenerate 
only on the interior of the identified set, this lack of nondegenerate binding moments holds 
even on the boundary of the identified set in typical applications. This leads to a faster than 
root-n rate of convergence to an asymptotic distribution that depends entirely on moments 
that are close to, but not quite binding. 

To see why this case is typical in applications, consider an application of moment inequal- 
ities to regression with interval data. In the interval regression model, E{W*\Xi) = X[(i, 
and W* is unobserved, but known to be between observed variables Wl^ and W^" , so that [3 
satisfies the moment inequalities 

E{Wt\X,) < X[I3 < E{W^\X,). 

Suppose that the distribution of X^ is absolutely continuous with respect to the Lebesgue 
measure. Then, to have one of these inequalities bind on a positive probability set, E{Wl'\Xi) 
or E{Wl^\Xi) will have to be linear on this set. Even if this is the case, this only means that 
the moment inequality will bind on this set for one value of /3, and the moment inequality 
will typically not bind when applied to nearby values of /3 on the boundary of the identified 
set. Figures [1] and [2] illustrate this for the case where the conditioning variable is one 
dimensional. Here, the horizontal axis is the nonconstant part of x, and the vertical axis 
plots the conditional mean of the W^^ along with regression functions corresponding to points 
in the identified set. Figure [1] shows a case where the KS statistic converges at a faster than 
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root-n rate. In Figure |2l the parameter Pi leads to convergence at exactly a root-n rate, but 
this is a knife edge case, since the KS statistic for testing P2 will converge at a faster rate. 

This paper derives asymptotic distributions under conditions that generalize these cases 
to arbitrary moment functions m{Wi, 9). In this broader setting, KS statistics converge at a 
faster than root-n rate on the boundary of the identified set under general conditions when 
the model is set identified and at least one conditioning variable is continuously distributed. 
In interval quantile regression, contact sets for the conditional median translate to contact 
sets for the conditional mean of the moment function, leading to faster than root-ra rates 
of convergence in similar settings. Bounds in selection models, such as those proposed 
by iManskil ( 119901 ). lead to a similar setup to the interval regress i on rn odel, as do some of 
the structural models considered by 



Pakes. Porte 



'. Ho. and 



Armstrong 



shiil f|2006h . with the intervals 



( I2OIII ) for primitive conditions 



depending on a first stage parameter estimate. See 
for a set of high-level conditions similar to the ones used in this paper for some of these 
models. 

While the results hold more generally, the rest of this section describes the results in 
the context of the interval regression example in a particular case. Consider deriving the 
rate of convergence and nondegenerate asymptotic distribution of the KS statistic for a 
parameter /3 like the one shown in Figure [H but with Xj possibly containing more than one 
covariate. Since the lower bound never binds, it is intuitively clear that the KS statistic for 
the lower bound will converge to zero at a faster rate than the KS statistic for the upper 
bound, so consider the KS statistic for the upper bound given by inf < Xj < 
s + t) where Yi = Wj^ — X[(3. If E{W^\Xi = x) is tangent to x'f3 at a single point xq, 
and E{Wl^\Xi = x) has a positive second derivative matrix V at this point, we will have 
E(Yi\Xi = x) ^ {x — xqYV^x — xq) near xq, so that, for s near xq and t close to zero, 
EY,I{s < X, < s + t) ^ fxixo) /;;+*^ ■ ■ ■ jl'f'"'' {x - x^)'V{x - Xq) dx^^ ■ ■ ■ dx^ (here, 
if the regression contains a constant, the conditioning variable Xj is redefined to be the 
nonconstant part of the regressor, so that dx refers to the dimension of the nonconstant part 
of X,). 

Since EYiI{s < Xj < s + t) = only when YJ^s < Xj < s + t) is degenerate, the 
asymptotic behavior of the KS statistic should depend on indices (s, t) where the moment 
inequality is not quite binding, but close enough to binding that sampling error makes 
EnYiI{s < Xi < s + t) negative some of the time. To determine on which indices {s,t) 
we should expect this to happen, split up the process in the KS statistic into a mean zero 
process and a drift term: {En — E)YiI{s < Xi < s + t) + EYiI{s < Xi < s + t). In order 
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for this to be strictly negative some of the time, there must be non-neghgible probabihty 
that the mean zero process is greater in absolute value than the drift term. That is, we 
must have sd{{En — E)YiI{s < Xi < s + t)) of at least the same order of magnitude 
as EYiI{s < Xi < s + t). The idea is similar to rate of convergence arguments for M- 
estimators with p ossibl y nonstandard rates of convergence, such as those considered by 
Kim and PollardI (ll99o[ l. We have sd{{En - E)YiI{s < X., < s + t)) = 0(^11* U/ 



for small t, and some calculations show that, for s close to xq, EYiI{s < Xj < s + t) ~ 
fx{xo)^''''---g'^^''Hx-XoyV{x-Xo)dx,,---dx, > C\\{s-Xo,t)fll^t^ for some 
C > 0. Thus, we expect the asymptotic distribution to depend on (s, t) such that ^/ITT^/ 
is of the same or greater order of magnitude than — a^o; ^)||^ Hi which corresponds to 
II (s — xq, t)\\'^^/Y[i Isss than or equal to 0{l/y/n). 

To get the main intuition for the rate of convergence, let us first suppose that s — is 
of the same order of magnitude as t, and the components ti of t are of the same order of 
magnitude, and show separately that cases where components of (s, t) converge at different 
rates do not matter for the asymptotic distribution. If s — a;o and all components tj are to 
converge to zero at the same rate we must have ||(s — Xo,t)|| = 0{hn) and W^U = 0{h^), 
so that, if II (s - Xo,t)f v^TL^ < C'(l/v^), we will have 0{l/^) > hl^/J^ = hl^'^'''^ so 
that K < 0(l/ni/(2(2+dx/2))) ^ (9(^-1/(4+'^^)). Then, for with t in an /i„-neighborhood 
of zero, we will have {En-E)YiI{s < Xi < s + t) = Cp(v/rL^/v^) = Cp(n-('^^+2)/('^^+4)). 

Next suppose that s or converges to Xq more slowly than hn = n^^/^'^^'^'^^ or that one of the 
components of t converges to zero more slowly than In this case, we will have || (s — Xq, t) || 
greater than some sequence kn with kn/hn — )• oo, so that, to have ||(s — xq, t) \\'^\/Y[i U < 
0{l/y/n), we would have to have ^/Tl~Ti < 0{l/{kly/n)) so that {En-E)Yi{s < Xi < s + t) 
will be of order less than l/{k'^n), which goes to zero at a faster rate than the n^(^x+2)/idx+4:) 
rate that we get when the components of (s, t) converge at the same rate. 

Thus, we should expect that the values of (s, t) that matter for the asymptotic distribution 
of the KS statistic are those with (s — xo,t) of order n~^^^'^^~^'^\ and that the KS statistic 
will converge in distribution when scaled up by r?,~('^x+2)/(<^x+4) ^j-^g infimum of the limit 
of a sequence of local objective functions indexed by with (s — Xo,t) in a sequence 

of ^-i/(dx+4) nei ghborhoods of zero. Formalizing this argument requires showing that this 
intuitio n holds uniformly i n (s, t). The formal proof uses a "peeling" argument along the 
lines of iKim and PollardI (1l990f l , but a different type of argument is needed for regions 



where, even though ||(s — 2;o,t)|| is far from zero, some components of t are small enough 
that EnYiI{s < Xi < s + 1) may be slightly negative because the region {s < Xi < s + 1} 
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is small and happens to catch a few observations with 1^ < 0. The proof formalizes the 
intuition that these regions cannot matter for the asymptotic distribution, since Yli ti must 
be much smaller than when s is close to Xq and the components of t are of the same order 
of magnitude as each other. 

These results can be used for inference once the asymptotic distribution is estimated. In 
Section HJ I describe two procedures for estimating this asymptotic distribution. The first 
is a generic subsampling procedure that uses only the fact that the statistic converges to 
a nondegenerate distribution at a known rate. The second is based on estimating a finite 
dimensional set of objects that allows this distribution to be simulated. 

Both procedures rely on the conditional mean having a positive definite second derivative 
matrix near its minimum. To form tests that are asymptotically valid under more general 
conditions, I propose pre-tests for these conditions, and embed these tests in a procedure 
that uses the asymptotic approximation to the null distribution for which the pre-test finds 
evidence. I describe these pre-tests in SectionE], but, before doing this, I extend the results of 
Section [3] to a broader class of shapes of the conditional mean in Section [5l These results are 



useful for the pre-tests in Section [6TT1 which adapt methods from lPolitis. Romano, and Woli 
(119991 ) for estimating rates of convergence to this setting. Section 16.21 describes another pre- 
test for the conditions of Section [3l this one based on estimating the second derivative and 
testing for positive definiteness. The pre-tests are valid under regularity conditions governing 
the smoothness of the conditional mean. 

One of the appealing features of using asymptotically exact critical values over conserva- 
tive ones is the potential for more power against parameters outside of the identified set. In 
Section [TJ I consider power against local alternatives. I describe the intuition for the results 
in more detail in that section, but the main idea is that, for a sequence of alternatives 6'„ con- 
verging to a point 6 on the identified set that under which the argument described above goes 
through, the drift process has an additional term E{m(Wi, On) —m(Wi, 9))I{s < X < s + t), 
where s — xo and t are of order hn- The exact asymptotics will detect 6n when this term 
is of order 7?,~('^x+2)/('ix+4)^ while conservative asymptotics will have power only when 9n is 
large enough so that this term is of order ri"^/^. This leads to power against local alterna- 
tives of order n~'^^^'^^~^^^ for the asymptotically exact critical values, and n~^^^'^^^'^^ when the 
conservative y/n approximation is used. 
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3 Asymptotic Distribution of the KS Statistic 



Given iid observations (Xi, Wi), . . . , Wn), of random variables Xi e M"^^, Wi G , we 
wish to test the null hypothesis that E{m{Wi, 6) \Xi) > almost surely, where m : M.'^^' x O — > 
M*^^ is a known measurable function and 6* G O C M*^" is a fixed parameter value. I use the 
notation m{6,x) to denote a version of E{m{Wi,6)\X = x) (it will be clear from context 
which version is meant when this matters). In some cases when it is clear which parameter 
value is being tested, I will define Yi = m{Wi, 9) for notational convenience. Defining Bq to 
be the identified set of values of 6' in that satisfy E{m{Wi, 6)\Xi) > almost surely, these 
tests can then be inverted to o btain a confidence region t hat, for every 9o G 6o, contains 6*0 



200^ . The tests considered here will be 



with a prespecified probability (llmbens and Manskil . 
based on asymptotic approximations, so that these statements will only hold asymptotically. 

The results in this paper allow for asymptotically exact inference using KS style statistics 
in cases where the ^/n approximations for these statistics are degenerate. This includes the 
case described in the introduction in which one component of E{m{Wi,6)\Xi) is tangent to 
zero at a single point and the rest are bounded away from zero. While this case captures 
the essential intuition for the results in this paper, I state the results in a slightly more 
general way in order to make them more broadly applicable. I allow each component of 
E[m{Wi,6)\X) to be tangent to zero at finitely many points, which may be different for 
each component. This is relevant in the interval regression example for parameters for which 
the regression line is tangent to E{Wl^\X) and E{Wl"\X) at different points. In the case of 
an interval regression on a scalar and a constant, the points in the identified set corresponding 
to the largest and smallest values of the slope parameter will typically have this property. 

I consider KS style statistics that are a function of inf^ ^ Enm{Wi, 0)I{s < Xi < s + t) = 
{mis,tEnmi{Wi,9)I{s < Xi < s + t),. . . , inf^,* E^m^^ (W^i, 6')J(s < Xi < s + t)). Fixing some 
function S : M'^^ — )■ M+, we can then reject for large values of ^(infs^t Enm(Wi, 9)I{s < Xi < 
s+t)) (which correspond to more negative values of the components of inf^ ^ Enm{Wi, 0)I{s < 
Xi < s + t) for typical choices of S). Note that this is different in general than taking 
sViPgj.S{Enm(Wi,9)I{s < Xi < s + t)), although similar ideas will apply here. Also, the 
moments Enm(Wi, 9)I{s < Xi < s + 1) are not weighted, but the results could be extended 
to allow for a weighting function u){s, t), so that the infimum is over u{s, t)Enm{Wi, 0)I{s < 
Xj < s + t) as long as u{s, t) is smooth and bounded away from zero and infinity. The condi- 
tio n that the weight 



Andrews and Shi 



u nctio n be b ound e d unif ormly in the sample size, which is also imposed 
( 2009 ) and Kim ( 20081 ) . turns out to be important (see Armstrong . 



201lh . 
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I formalize the notion that 6* is at a point in the identified set such that one or more of 
the components of E[m{Wi, 9)\Xi) is tangent to zero at a finite number of of points in the 
following assumption. 

Assumption 1. For some version of E{m{Wi,6)\Xi) , the conditional mean of each element 
of m{Wi,6) takes its minimum only on a finite set {x\E{mj{Wi,6)\X = x) = some j} = 
= {xi, . . . ,Xe}. For each k from 1 to £, let J{k) be the set of indices j for which 
E{mj{Wi,6)\X = Xk) = 0. Assume that there exist neighborhoods B{xk) of each Xk G Xq 
such that, for each k from 1 to i, the following assumptions hold. 

i.) E{mj{Wi,6)\Xi) is bounded away from zero outside of ul^^B{xk) for all j and, for 
j ^ J{k), E{mj{Wi,6)\Xi) is bounded away from zero on B{xk)- 

a.) For j G J{k), X H- i- E{mj{Wi,6)\X = x) has continuous second derivatives inside of 
the closure of B{xk) and a positive definite second derivative matrix Vj{xk) at each Xk- 

Hi.) X has a continuous density fx on B{xk). 

iv.) Defining mj(^k){Wi,6) to have jth component mj{Wi,6) if j G J{k) and otherwise, 
X I— E{mj(^k){Wi, 6)mj(^k){Wi, 6)'\Xi = x) is finite and continuous on B{xk) for some 
version of this conditional second moment matrix. 

Assumption [U is the main substantive assumption distinguishing the case considered here 
from the case where the KS statistic converges at a y/n rate. In the y/n case, some component 
of E{m(Wi,9)\Xi) is equal to zero on a positive probabihty set. Assumption [1] states that 
any component of E{m{Wi,6)\Xi) is equal to zero only on a finite set, and that Xi has a 
density in a neighborhood of this set, so that this finite set has probability zero. Note that 
the assumption that X^ has a density at certain points means that the moment inequalities 
must be defined so that Xi does not contain a constant. Thus, the results stated below hold 
in the interval regression example with dx equal to the number of nonconstant regressors. 

Unless otherwise stated, 1 assume that the contact set in Assumption [1] is nonempty. 
If Assumption [1] holds with Xq empty so that the conditional mean m[6,x) is bounded from 
below away from zero, 6 will typically be on the interior of the identified set (as long as 
the conditional mean stays bounded away from zero when 6 is moved a small amount). For 
such values of 6, KS statistics will converge at a faster rate (see Lemma [6] in the appendix), 
leading to conservative inference even if the rates of convergence derived under Assumption 
[H which are faster than y/n, are used. 
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In addition to imposing that the minimum of the components of the conditional mean 
m{6,x) over x are taken on a probabihty zero set, Assumption [1] requires that this set 
be finite, and that m{6,x) behave quadratically in x near this set. I state results under 
this condition first, since it is easy to interpret as arising from a positive definite second 
derivative matrix at the minimum, and is likely to provide a good description of many 
situations encountered in practice. In Section [5l I generalize these results to other shapes 
of the conditional mean. This is useful for the tests for rates of convergence in Section El 
since the rates of convergence turn out to be well behaved enough to be estimated using 
adaptations of existing methods. 

The next assumption is a regularity condition that bounds rrijCWijO) by a nonrandom 
constant. This assumption will hold naturally in models based on quantile restrictions. In the 
interval regression example, it requires that the data have finite support. This assumption 
could be replaced with an assumption that m(Wi, 0) has exponentially decreasing tails, or 
even a finite pth moment for some potentially large p that would depend on dx without 
much modification of the proof, but the finite support condition is simpler to state. 

Assumption 2. For some nonrandom Y < oo, \mj{Wi,6)\ < Y with probability one for 
each j . 

Finally, I make the following assumption on the function 5*. Part of this assumption could 
be replaced by weaker smoothness conditions, but the assumption covers x ^ = ||a;AO|| 
for any norm || ■ || as stated, which should suffice for practical purposes. 

Assumption 3. S : M.'^^' — > M+ is continuous and satisfies S{ax) = aS{x) for any nonneg- 
ative scalar a. 

The following theorem gives the asymptotic distribution and rate of convergence for 
infg t EnTniWi, 9)I{s < Xi < s+t) under these conditions. The distribution of S'(infs_t Enm(Wi, 9)I{s < 
Xi < s + 1)) under mild conditions on S then follows as an easy corollary. 

Theorem 1. Under AssumptionsUl and{^ 

^(dx+2)/(dx+4) Er,m{Wi, e)I{s <Xi<S + t)AZ 

s,t 

where Z is a random vector on M.'^^ defined as follows. Let Gp^xki^^t) > ^ = !,...,£ be 
independent mean zero Gaussian processes with sample paths in the space C (M^*^^ , M"^^ ) of 
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continuous functions from M^'^-^ to W^^ and covariance kernel 

cov{Gp^x^{s,t),Gp^a:^{s' ,t')) = E{mj^k){Wi,9)mj^k){Wi,6y\Xi = Xk)fx{xk) / dx 

J sVs'<x<{s+t)A{s'+t') 

where mj(^k){Wi,6) is defined to have jth element equal to mj{Wi,6) for j G J{k) and equal 
to zero for j ^ J{k). For k = let gp^^^ : M^'^^ M'^^ be defined by 

gp,xk,j{s,t) = -fx{xk) / • • • / x'Vj{xk)xdxdx ■■■dxi 

for j G J{k) and gxk,j{s,t) = for j ^ J{k). Define Z to have jth element 

Zj = , min inf Gp,^,j(s,t) + c/p,^,j(s,t). 

k s.t. ]€J{k) (s,t)eM2dx 

The asymptotic distribution of S'(infs_t Enm{Wi, 9)1 {s < Xj < s + t)) follows immediately 
from this theorem. 

Corollary 1. Under AssumptionsUl IE, and{^ 

^fe+2)/fe+4)^^i^f ^Xi<s + t))A S{Z) 



s,t 



for a random variable Z with the distribution given in TheoremUl 

These results will be useful for constructing asymptotically exact level a tests if the 
asymptotic distribution does not have an atom at the 1 — a quantile, and if the quantiles of 
the asymptotic distribution can be estimated. In the next section, I show that the asymp- 
totic distribution is atomless under mild conditions and propose two methods for estimating 
the asymptotic distribution. The first is a generic subsampling procedure. The second is a 
procedure based on estimating a finite dimensional set of objects that determine the asymp- 
totic distribution. This provides feasible methods for constructing asymptotically exact 
confidence intervals under Assumption [TJ However, while, in many cases, this assumption 
characterizes the distribution of {Xi,m{Wi,6)) for most or all values of 6 on the boundary 
of the identified set, it is not an assumption that one would want to impose a priori. Thus, 
these tests should be embedded in a procedure that tests between this case and cases where 
E{m(Wi, 9)\X) = on a positive probability set, or where E{m{Wi, 9)\X) is still equal to 
only at finitely many points, but behaves like or the absolute value function or something 
else near these points rather than a quadratic function. In Section [5l I generalize Theorem [1] 
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to handle a wider set of shapes of the conditional mean, with different rates of convergence 
for different cases. In Section [6l 1 propose procedures for testing for Assumption [1] under 
mild smoothness conditions. Combining one of these preliminary tests with inference that 
is valid in the corresponding case gives a procedure that is asymptotically valid under more 
general conditions. These include tests based on estimating the rate of convergence directly, 
which use the results of Section [51 

4 Inference 

To ensure that the asymptotic distribution is continuous, we need to impose additional 
assumptions to rule out cases where components of m{Wj,9) are degenerate. The next 
assumption rules out these cases. 

Assumption 4. For each k from 1 to i, letting jk,i, ■ ■ ■ ,jk,\j{k)\ be the elements in J{k), the 
matrix with q,rth element given by E{mj^ ^{Wi,6)mj^, ^{Wi,6)\Xi = Xk) is invertible. 

This assumption simply says that the binding components of m{Wi, 6) have a nonsingular 
conditional covariance matrix at the point where they bind. A sufficient condition for this is 
for the conditional covariance matrix of m{Wi, 9) given Xj to be nonsingular at these points. 

I also make the following assumption on the function S*, which translates continuity of 
the distribution of Z to continuity of the distribution of S{Z). 

Assumption 5. For any Lehesgue measure zero set A, S~^{A) has Lebesgue measure zero. 

Under these conditions, the asymptotic distribution in Theorem [1] is continuous. In 
addition to showing that the rate derived in that theorem is the exact rate of convergence 
(since the distribution is not a point mass at zero or some other value), this shows that 
inference based on this asymptotic approximation will be asymptotically exact. 

Theorem 2. Under Assumptions [Jl IE, and\^ the asymptotic distribution in TheoremUlis 
continuous. If Assumptionsl^andl^hold as well, the asymptotic distribution in CorollaryU\ 
is continuous. 

Thus, an asymptotically exact test of E{m{Wi, 0)\Xi) > can be obtained by comparing 
the quantiles of S {inf s,tEnm(Wi, 9)1 {s < Xi < s + t)) to the quantiles of any consistent 
estimate of the distribution of S{Z). I propose two methods for estimating this distribution. 
The first is a generic subsampling procedure. The second method uses the fact that the 
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distribution of Z in Theorem [T] depends on the data generating process only through finite 
dimensional parameters to simulate an estimate of the asymptotic distribution. 

Subsampling is a generic procedure for estimating t he distribution of a statistic usin g 



versions of the statistic formed with a smaller sample size ( iPolitis. Romano, and Wolft 119991 ). 
Since many independent smaller samples are available, these can be used to estimate the 
distribution of the original statistic as long as the distribution of the scaled statistic is 
stable as a function of the sample size. To describe the subsampling procedure, let T„(6') = 
\v&stEnmiyVi^&)l{s < Xj < s + t). For any set of indices S C {1, . . . define Ts{&) = 
inis t J2ies ^(Wij < Xi < s + t). The subsampling estimate of P{S{Z) < t) is, for 

some subsample size b, 



\S\=b 

One can also estimate the null distribution using the centered subsampling estimate 

1 J2 1 {b^'-^'y^'-^'^[s{Ts{e)) - siue))] < t) . 
yb) \s\=b 

For some nominal level a, let qb,i-a be the 1 — a quantile of either of these subsampling 
distributions. We reject the null hypothesis that 6 is in the identified set at level a if 
fi(dx+2)/{dx+4:) g(rp^(^g^^ ^ ^f),i-Q. and fail to reject otherwise. The following theorem states 
that this procedure is as ymptotically exact. The result fo llows immediately from general 



results for subsampling in iPolitis. Romano, and Wolj (Il999[ ). 



Theorem 3. Under Assumptions CI HI ^he probability of rejecting using the 
subsampling procedure described above with nominal level a converges to a as long as 6 oo 
and b/n — )■ 0. 

While subsampling is valid under general conditions, subsampling estimates may be less 
precise than estimates based on knowledge of how the asymptotic distribution relates to the 
data generating process. One possibility is to note that the asymptotic distribution in Theo- 
rem [1] depends on the underlying distribution only through the set Xq and, for points in Xq, 
the density fxi^k), the conditional second moment matrix i?(mj(fc)(l^j, 9)mj(^k)(Wi, 9y\X = 
Xk), and the second derivative matrix V{xk) of the conditional mean. Thus, with consistent 
estimates of these objects, we can estimate the distribution in Theorem [1] by replacing these 
objects with their consistent estimates and simulating from the corresponding distribution. 
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In order to accommodate different metliods of estimating fxi^k), E{mj(^k)(Wi, 9)mj(^k)(Wi, Oy\X 
Xk), and V{xk), I state the consistency of tfiese estimators as a fiigli level condition, and show 
that the procedure works as long as these estimators are consistent. Since these objects only 
appear as E{mj(^k){Wi,6)mj(^k){Wi,6y\X = Xk)fx{xo) and fx{xk)V{xk) in the asymptotic 
distribution, we actually only need consistent estimates of these objects. 

Assumption 6. The estimates Mk{xk) , fx{xk), andV{xk) satisfy fx {xk)V{xk) A fx{xk)V{xk) 
and Mk{xk)fx{xk) ^ E{mj(^k)iWi,9)mj(^k)iWi,9y\X = Xk)fx{xk)- 

For k from 1 to let Gp^j;j.(s,t) and gp^xki^)^) be the random process and mean function 
defined in the same way as Qp^.j;^{s,t) and gp^xki^ii)-, but with the estimated quantities 
replacing the true quantities. We estimate the distribution of Z defined to have jth element 

m s.t. j&.J{k) {s,t) 



using the distribution of Z defined to have jth element 

Zj = min inf Gp^^^j{s,t) + gp,^^j{s,t) 

for some sequence Bn going to infinity. The convergence of the distribution Z to the dis- 
tribution of Z is in the sense of conditional weak converg ence in probability often use d 



in proofs of the validity of the bootstrap (see, for example, iLehmann and Romand . l2005l ). 
From this, it follows that tests that replace the quantiles of S{Z) with the quantiles of S{Z) 
are asymptotically exact under the conditions that guarantee the continuity of the limiting 
distribution. 

Theorem 4. Under Assumptionl^ p{Z, A where p is any metric on probability distri- 
butions that metrizes weak convergence. 

Corollary 2. Let qi^a be the 1 — a quantile of S{Z). Then, under Assumptions\^\^\^^\^ 
and\^ the test that rejects when n^^^'^'^^^^''-^'^^^ S{Tn{9)) > qi_a and fails to reject otherwise 
is an asymptotically exact level a test. 

If the set is known, the quantities needed to compute Z can be estimated consistently 
using standard methods for nonparametric estimation of densities, conditional moments, and 
their derivatives. However, typically is not known, and the researcher will not even want 
to impose that this set is finite. In Section [6l I propose methods for testing Assumption [1] and 
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estimating the set Xq under weaker conditions on the smoothness of the conditional mean. 
These conditions allow for both the 77,('^^+2)/('^x+4) asymptotics that arise from Assumption 
[Hand the -y/n asymptotics that arise from a positive probability contact set. 

Before describing these results, I extend the results of Section E] to other shapes of the 
conditional mean. These results are needed for the tests in Section 16. ![ which rely on the 
rate of convergence being sufficiently well behaved if it is in a certain range. 



5 Other Shapes of the Conditional Mean 

Assumption [1] states that the components of the conditional mean m{6, x) are minimized 
on a finite set and have strictly positive second derivative matrices at the minimum. More 
generally, if the conditional mean is less smooth, or does not take an interior minimum, 
Th{6, x) could be minimized on a finite set, but behave differently near the minimum. Another 
possibility is that the minimizing set could have zero probability, while containing infinitely 
many elements (for example, an infinite countable set, or a lower dimensional set when 
dx > 1). 

In this section, I derive the asymptotic distribution and rate of convergence of KS statis- 
tics under a broader class of shapes of the conditional mean Th{6,x). I replace part (ii) of 
Assumption [T] with the following assumption. 

Assumption 7. For j G J{k), mj{6,x) = E{mj{Wi,6)\X = x) is continuous on B{xk) and 
satisfies 



sup 

lla;— a;i.||<<5 



mj{6,x) — mj{6,Xk) f x — Xk 



\x — XkP'-^''''^ \\\X — Xk\ 



for some 7(j, k) > and some function i/jj^^ '■ {t € R'^-^|||t|| = 1} — t- M with ip > ipj^kit) > 4' 
for some ip < oo and ip > For future reference, define 7 = maxj^^TO, k) and J{k) = {j G 
J(A;)|7(j,fc)=7}. 

When Assumption [7] holds, the rate of convergence will be determined by 7, and the 
asymptotic distribution will depend on the local behavior of the objective function for j and 
k with j G J{k). 

Under Assumption [H Assumption [7| will hold with 7 = 2 and i'j^kit) = ^tVj{xk)t (this 
holds by a second order Taylor expansion, as described in the appendix). For 7 = 1, 
Assumption [7] states that mj{6,x) has a directional derivative for every direction, with 
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the approximation error going to zero uniformly in the direction of the derivative. More 
generally, Assumption [7] states that rhj{6, x) increases like — near elements in the 
minimizing set Xq- For dx = ^, this follows from simple conditions on the higher derivatives 
of the conditional mean with respect to x. With enough derivatives, the first derivative that 
is nonzero uniformly on the support of determines 7. I state this formally in the next 
theorem. For higher dimensions. Assumption [7] requires additional conditions to rule out 
contact sets of dimension less than dx, but greater than 1. 

Theorem 5. Suppose rh{6,x) has p bounded derivatives, dx = ^ and supp{Xi) = [x,x]. 
Then, if min j mix frij {6, x) = 0, either Assumption^ holds, with the contact set possibly 
containing the boundary points x and x, for 7 = r for some integer r < p, or, for some Xq 
on the support of Xi and some finite B, mj{6,x) < B\x — xo\^ for some j. 

Theorem [5] states that, with dx = ^ and p bounded derivatives, either Assumption [7] 
holds for 7 some integer less than p, or, for some j, rhj{6,x) is less than or equal to the 
function B\x — xo\^, which would make Assumption [7] hold for 7 = p. In the latter case, the 
rate of convergence for the KS statistic must be at least as slow as the rate of convergence 
when Assumption [1] holds with 7 = p. While an interior minimum with a strictly positive 
second derivative or a minimum at x or x with a nonzero first derivative seem most likely. 
Theorem [5] shows that Assumption [7] holds under broader conditions on the smoothness of 
the conditional mean. This, along with the rates of convergence in Theorem |6] below, will be 
useful for the methods described later in Section [6] for testing between rates of convergence. 
With enough smoothness assumptions on the conditional mean, the rate of convergence 
will either be for 13 in some known range, or strictly slower than for some known (3. 
With this prior knowledge of the possible types of asymptotic behavior of T„(6') in hand, 
one can use a modified ve r sion of the estimators of the rate of convergence proposed by 



Politis. Romano, and Wolj ( 119991 ) to estimate 7 in Assumption [71 and to test whether this 
assumption holds. 

Under Assumption [1] with part (ii) replaced by Assumption [TJ the following modified 
version of Theorem [H with a different rate of convergence and limiting distribution, will 
hold. 

Theorem 6. Under AssumptionUl with part (ii) replaced by Assumption^ and Assumption 

m 

^fe+7)/fe+27) inf E„m(Wi, e)I{s <X,< s + t) A Z 

s.t 
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where Z is the random vector on M'^"^ defined as in Theorem [I], but with J{k) replaced by 
J{k) and gp^xkji^^t) defined as 

rsi+ti r-'^dx+tdx f X \ 
9P,xk,jis, t) = fxixk) / ■ ■ ■ / i>j,k I -n— II Ikir dxdx ■■■dxi 

for j G J{k). If AssumptionlM holds as well, then 

n^'ix+j)/idx+2^) EnmiWi, 9)1(3 < X, < s + t)) A SiZ). 

s,t 

If Assumption\^ holds as well, Z has a continuous distribution. If Assumptionsl^\^and 
hold, S{Z) has a continuous distribution. 

Theorem [6] can be used once Assumption [7| is known to hold for some 7, as long as 7 can 
be estimated. I treat this topic in the next section. Theorem [5] gives primitive conditions for 
this to hold for the case where dx = ^ that rely only on the smoothness of the conditional 
mean. The only additional condition needed to use this theorem is to verify that the set 
A'o does not contain the boundary points x and x. In fact, the requirement in Theorems 
[U and [6] that not contain boundary points could be relaxed, as long as the boundary is 
sufficiently smooth. The results will be similar as long as the density of Xi is bounded away 
from zero on its support, and cases where the density of Xj converges to zero smoothly n ear 
its support could be handled using a transormation of the data (see lArmstrongj . 1201 ll . for 
an example of this approach in a slightly different setting). Alternatively, a pre-test can be 
done to see if the conditional mean is bounded away from zero near the boundary of the 
support of Xi so that these results can be used as stated. 



6 Testing Rate of Convergence Conditions 

The n("'^+2)/{'ix+4) convergence derived in Section [3] holds when the minimum of mj(9,x) = 
E{mj{Wi,6)\Xi = x) is taken at a finite number of points, each with a strictly positive 
definite second derivative matrix. The results in Section |5]extend these results to other shapes 
of the conditional mean near the contact set, which result in different rates of convergence. 
In contrast, if the minimum is taken on a positive probability set, convergence will be at the 
slower a/w rate. Under additional conditions on the smoothness of mj{6, x) as a function of 
X, it is possible to test for the conditions that lead to the faster convergence rates. In this 
section, I describe two methods for testing between these conditions. In Section |6TT| I describe 
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tests that use a Keneric test for rates of convergence based on subsampling proposed by 
Politis. Romano, and Wolj (Il999[ ). These tests are vahd as long as the KS statistic converges 
to a nondegenerate distribution at some polynomial rate, or converges more slowly than some 
imposed rate, and the results in Section [5] give primitive conditions for this. In Section 16.21 
I propose tests of Assumption [1] based on estimating the second derivative matrix of the 
conditional mean. 



6.1 Tests Based on Estimating the Rate of Converence Directly 



he p re-tests proposed in this section mostly follow Chapter 8 of iPolitis. Romano, and Woli 



(119991 ). using the results in Section [5] to give primitive conditions under which the rate of 



convergence will be well behaved so that these results can be applied, with some modifications 
to accomodate the possibility that the statistic may not converge at a polynomia l rate if the 
rate is slow enough. Following the notation of iPolitis. Romano, and Wolj (119991 ). define 



^ J2 I{n[S{Ts{e)) - SiUO))] < 

\b) 



\S\=b 



for any sequence r„, and define 



L„,,(x|i) ^(siTsie)) - siue)) < 

[b) 



X). 



\S\=b 



Let 



L-l(t\l) = m{{x\LnM^)>t} 

be the tth quantile of Ln,6(a;|l), and define L~\{t\T) similarly. Note that T}yL~\{t\l) = 
L~\{t\T). If Tn is the true rate of convergence, L~\^(t\T) and L~\^(t\T) both approximate 
the tth quantile of the asymptotic distribution. Thus, if r„ = for some /3, 6fL~j,^(t|l) 
and bfL~\^(t\l) should be approximately equal, so that an estimator for /3 can be formed by 
choosing f3 to set these quantities equal. Some calculation gives 



/3 = {\ogL-l^{t\l))-\ogL-l^m)/{\ogh-\ogb,). 



(1) 



This is a special case of the class of estimators described in IPolitis. Romano, and WoliI (119991 ) 
which allow averaging of more than two block sizes and more than one quantile (these 
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estimators could be used here as well). 

Note that the estimate L„ ;,(a;|r) centers the subsampling draws around the KS statistic 
S{Tn{6)) rather than its limiting value, 0. This is necessary for the rate of convergence 
estimate not to diverge under fixed alternatives. Once the rate of convergence is known or 
estimated, either L„_ft(a;|r) or an uncentered version, defined as 



\b) \s\=b 



X] 



can be used to estimate the null distribution o 
The results in 



the scaled statistic. 



Politis. Romano, and Wolfl ( 1l999l ) show that subsamphng with the esti- 



mated rate of convergence n'^ is valid as long as the true rate of convergence is for some 
/3 > 0. However, this will not always be the case for the estimators considered in this pa- 
per. For example, under the conditions of Theorem [5l the rate of convergence will either be 
^(i+7)/(i+27) some 7 < p (here, dx = 1), or the rate of convergence will be at least as 
slow as n^^+py ^^+'^p\ but may converge at a slower rate, or oscillate between slower rates of 
convergence. Even if Assumption [5] holds for some 7 for 6 on the boundary of the identified 
set, the rate of convergence will be faster for 6 on the interior of the identified set, where 
trying not to be conservative typically has little payoff in terms of power against parameters 
outside of the identified set. 

To remedy these issues, I propose truncated versions defined as follows. For some 1/2 < 
/3 < /3 < 1, let /3 be the estimate given by ([T]) for bi = n^^ and 62 = for some 1 > xi > 
X2 > 0, and let (3a be the estimate given by ([T]) for 62 = n^" for some 1 > Xa > and bi 
some fixed constant that does not change with the sample size (if L~\_^{t\l)) = 0, replace 
this with an arbitrary positive constant in the formula for /3a so that /3a is well defined). The 
test described in the theorem below uses f3a to test whether the rate of convergence is slow 
enough that the conservative rate n^/^ should be used, and uses /3 to estimate the rate of 
convergence otherwise, as long as it is not implausibly large. If the rate of convergence is 
estimated to be larger than /3 (which, for large enough /3, will typically only occur on the 
interior of the identified set), the estimate is truncated to /3. When the rate of convergence 
is only known to be either for some (3 G [/3, /3], or either slower than n- or faster than n^, 
this procedure provides a conservative approach that is still asymptotically exact when the 
exponent of the rate of convergence is in (/3, /3). 

Theorem 7. Suppose that Assumptionsl^lEandl^hold, and that S is convex and E{m{Wi, 6)m{Wi, 6')'|Xj 
x) is continuous and strictly positive definite. Suppose that, for some 7, AssumptionsU\ and 
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[7] hold with part (ii) of Assumption U\ replaced by Assumption^ for some 7 < 7, where the 
set Xq = {x\mj{6,x) = some j} may be empty, or, for some Xq G Xq such that Xi has a 
continuous density in a neighborhood of xq and B < 00, mj{6,x) < B\\x — for some 
7 > 7 and some j . 

Let (3 = {dx + 'y)/{dx + 2'y) for some 7 < 7 and let (3 = {dx + j) / {dx + 2'^) ■ Let (3, (3a and 
be defined as above for some < Xi < X2 < 1 ond < Xa < 1- Consider the following test. 
Ifk > ^, reject z/ >^5(r„(^^)) > - a\b^'^'^) (or if n^'^'^ S{T^{e)) > - a\b^'^'^)) 

where b = n^^ for some < Xs < 1- If Pa < P, perform any (possibly conservative) 
asymptotically level a test that compares n^^'^S(Tn{0)) to a critical value that is bounded 
away from zero. 

Under these conditions, this test is asymptotically level a. If Assumption Ul holds with 
part (ii) of Assumption^ replaced by Assumption^ for some 7 < 7 < 7 and Xq nonempty, 
this test will be asymptotically exact level a. 

In the one dimensional case, the conditions of Theorem [7] follow immediately from smooth- 
ness assumptions on the conditional mean by Theorem [5l As discussed above, the condition 
that the minimum not be taken on the boundary of the support of Xi could be removed, or 
the result can be used as stated with a pre-test for this condition. 

Theorem 8. Suppose that dx = 1, Assumptions [H and [3 hold, and that S is convex 
and E{m{Wi,9)m{Wi^9)'\Xi = x) is continuous and strictly positive definite. Suppose that 
supp{Xi) = [x,x] and thatrh{9,x) is bounded away from zero near x andx and hasp bounded 
derivatives. Then the conditions of Theorem^ hold for any 7 < p. 



6.2 Tests Based on Estimating the Second Derivative 

I make the following assumptions on the conditional mean and the distribution of X^. These 
conditions are used to estimate the second derivatives of m{6,x) = E{mj(Wi,9)\Xi = x), 
and t he results are stated for loc al polynomial estimates. The conditions and results here are 
from llchimura and Toddl (120071 ) . Other nonparametric estimators of conditional means and 



their derivatives and conditions for uniform convergence of such estimators could be used 
instead. The results in this section related to testing Assumption [T] are stated for rrijiWi, 0) 
for a fixed index j. The consistency of a procedure that combines these tests for each j then 
follows from the consistency of the test for each j. 

Assumption 8. The third derivatives ofrhj{6,x) with respect to x are Lipschitz continuous 
and uniformly bounded. 
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Assumption 9. Xj has a uniformly continuous density fx such that, for some compact set 
D G W'- , mixfzD fx{x) > 0, and E{mj{Wi,6)\Xi) is hounded away from zero outside of D. 

Assumption 10. The conditional density of Xi given mj(Wi,d) exists and is uniformly 
hounded. 

Note that Assumption [TO] is on tlie density of Xi given nijiWi, 6), and not the other way 
around, so that, for example, count data for the dependent variable in an interval regression 
is okay. 

Let A'q be the set of minimizers of mj{6,x) if this function is less than or equal to for 
some X and the empty set otherwise. In order to test Assumption [T|, I first note that, if the 
conditional mean is smooth, the positive definiteness of the second derivative matrix on the 
contact set will imply that the contact set is finite. This reduces the problem to determining 
whether the second derivative matrix is positive definite on the set of minimizers of r rijiO, x) 



a pro blem similar to testing local identification conditions in nonlinear models (see [Wright 



2003[ ). I record this observation in the following lemma. 



Lemma 1. Under AssumptionslE andlBi if the second derivative matrix of E{mj{Wi,6)\Xi = 
x) is strictly positive definite on Xq, then Xq must he finite. 

According to Lemma[Tl once we know that the second derivative matrix of E{mj{Wi, 6) \Xi) 
is positive definite on the set of minimizers E{mj{Wi,6)\Xi), the conditions of Theorem [1] 
will hold. This reduces the problem to testing the conditions of the lemma. One simple way 
of doing this is to take a preliminary estimate of Xq that contains this set with probability 
approaching one, and then test whether the second derivative matrix of E{mj{Wi,6)\Xi) is 
positive definite on this set. In what follows, I describe an approach based on local poly- 
nomial regression estimates of the conditional mean and its second derivatives, but other 
methods of estimating the conditional mean would work under appropriate conditions. The 
methods require knowledge of a set D satisfying Assumption [91 This set could be chosen 
with another preliminary test, an extension which I do not pursue. 

Under the conditions above, we can estimate mj{6, x) and its derivatives at a given point 
X with a local second order polynomial regression estimator defined as follows. For a kernel 
function K and a bandwidth parameter h, run a regression of mj{Wi,6) on a second order 
polynomial of Xi, weighted by the distance of Xi from x by K[[X — x)/h). That is, for each 
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j and any x, define mj{6, x), Pj{x), and Vj{x) to be the values of m, /3, and V tliat minimize 



Er, 



mj{W„ e)-(m + {X,- + -{Xi - x) V(X, - x) 



1 2 



X K{{X, - x)/h) \ . 



Tlie pre-test uses rhj{9,x) as an estim ate of ffij{9,x) and Vjix) a s an estimate of Vj{x). 

The following theorem, taken from llchimura and Toddl ( l2007l . Theorem 4.1), gives rates 
of convergence for these estimates of the conditional mean and its second derivatives that 
will be used to estimate Xq and Vj{x) as described above. The theorem uses an additional 
assumption on the kernel K. 

Assumption 11. The kernel function K is hounded, has compact support, and satisfies, for 
some C and for any < ji + ■■■+>< 5, ■ ■ ■ u{rK{u) — ■ ■ ■ vi''K{v)\ < C\\u — v\\ . 

Theorem 9. Under iid data and Assumptionsl^ 0, El [701 andUll 



sup 



Vj,rs{^) - V,,rs{x) = 0p((l0gn/(n/l'^-+^))V2) + 0,{h) 

for all r and s, where Vj-rs is the r,s element ofVj, and 

sup \mj{e,x) -mj{e,x) \ = Op{{\ogn/{nh'^'')y/'^) + Op{h^). 

For both the conditional mean and the derivative, the first term in the asymptotic order 
of convergence is the variance term and the second is the bias term. The optimal choice of 
h sets both of these to be the same order, and is /i„ = (log n/nY^^'^^^^^ in both cases. This 
gives a (log ra/n)^/^'^^^^^ rate of convergence for the second derivative, and a (logn/n)^/^'^-^"^^) 
rate of convergence for the conditional mean. However, any choice of h such that both terms 
go to zero can be used. 

In order to test the conditions of Lemma [Tj we can use the following procedure. For some 
sequence a„ growing to infinity such that an[{\ogn/ (nh'^^)y^'^ V h^] converges to zero, let 

= {x e D\mj{e,x) - (inf^/e^mj(^,x') A 0)| < [a„(logn/(n/i'^^))i/2 y /^3j|^ gy Theorem 
[9l will contain Xq with probability approaching one. Thus, if we can determine that 
Vj{x) is positive definite on A'J, then, asymptotically, we will know that Vj{x) is positive 
definite on Xq. Note that Xq is an estimate of the set of minimizers of fnj{x,6) over x if 
the moment inequality binds or fails to hold, and is eventually equal to the empty set if the 
moment inequality is slack. 
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Since the determinant is a differentiable map from R'^x toR, the Op{{\ogn/{nh'^^+*)y/^) + 
Op{h) rate of uniform convergence for Vj{x) translates to the same (or faster) rate of conver- 
gence for det Vj{x). If, for some xo & Xq, Vj{xo) is not positive definite, then Vj{xo) will be 
singular (the second derivative matrix at an interior minimum must be positive semidefinite 
if the second derivatives are continuous in a neighborhood of xq), and det Vj{xo) will be zero. 
Thus, inf^g^, det Vj{x) < det Vjixo) = Op{{\ogn/{nh'^^+^)y/^) + Op{h) where the inequality 
holds with probability approaching one. Thus, letting 6„ be any sequence going to infinity 
such that bn[0-Ogn/ {nh^^~^'^)Y^'^ V h] converges to zero, if Vj{xQ) is not positive definite for 
some Xq G A'o, we will have inf^^^j det Vj(x) < 6„[(logn/(n/i'^^+^))-'^/^ V h] with probability 
approaching one (actually, since we are only dealing with the point xq, we can use results 
for pointwise convergence of the second derivative of the conditional mean, so the log n term 
can be replaced by a constant, but I use the uniform convergence results for simplicity). 

Now, suppose Vj{x) is positive definite for all x G A'q. By Lemma [H we will have, for 
some B > 0, detV^(a;) > B for all x E Xq. By continuity of Vj{x), we will also have, 
for some e > 0, detVj{x) > B/2 for all x G where A'J'^ = {x| inf^,g^j \\x — x'\\ < e} 
is the e-expansion of X^. Since X^ C A'J'^ with probability approaching one, we will also 
have inf^,g^j det Vj{x) > B/2 with probabihty approaching one. Since det Vj{x) — )■ det Vj{x) 
uniformly over D, we will then have inf^^^^^j det Vj(x) > bn[{logn/{nh'^^^^)y^^ V h] with 
probability approaching one. 

This gives the following theorem. 

Theorem 10. Let Vj{x) and mj{9,x) be the local second order polynomial estimates defined 
with some kernel K with h such that the rate of convergence terms in Theorem\^ go to zero. 
Let Xq be defined as above with a„ [(log 72/(72/1'^^))"^/^ V h^] going to zero and an going to in- 
finity, and let bn be any sequence going to infinity such that bn[0-Ogn/ {nh'^^^^)y^'^ V h] goes 
to zero. Suppose that Assumptions [3 El [23, andlTll hold, and the null hypothesis holds 
with E{m{Wi,6)m{Wi,6)'\Xi = x) continuous and the data are iid. Then, if Assumption 
[I] holds, we will have inf^.g^j det V^(a;) > bn[i}ogn/{nh'^^^^)Y^^ V h] for each j with prob- 
ability approaching one. If Assumption Ul does not hold, we will have inf^^^j det lj(x) < 
bn[{\ogn/(nh'^^^'^)Y/'^ V h] for some j with probability approaching one. 

The purpose of this test of Assumption [1] is as a preliminary consistent test in a procedure 
that uses the asymptotic approximation in Theorem [1] if the test finds evidence in favor of 
Assumption [H and uses the methods that are rob ust to different types of c ontact sets, but 



possibly conservative, such as those described in [Andrews and Shil (120091 ) . otherwise. It 



follows from Theorem [10] that such a procedure will have the correct size asymptotically. In 
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the statement of the following theorem, it is understood that Assumptions H] and [6l which 
refer to objects in Assumption [H do not need to hold if the data generating process is such 
that Assumption [1] does not hold. 

Theorem 11. Consider the following test. For some bn oo and /i — )■ satisfying the 
conditions of Theorem IWl perform a pre-test that finds evidence in favor of Assumption 
[3 iff. M^^.^^detVj{x) > 6„[(logn/(n/i'^^+^))^/2 y ^^^^ jj ^ 0^ ^g^g^^ 

the null hypothesis that 6 e ©o- // inf det V,- (x) > 6„[(logn/(n/i"'^+^))^/^ V h] for each 
j, reject the null hypothesis that 9 E Qq if n^'^^^'^^^^^^^'^^S{Tn{9)) > qi-a where qi^a is an 
estimate of the 1 — a quantile of the distribution of S{Z) formed using one of the methods 
in Section^ // inf ^g_;^o ^et Vj- (x) < 6„[(logn/(?2/i'^-^+''))^/^ V h] for some j, perform any 
(possibly conservative) asymptotically level a test. Suppose that Assumptions\^ El 
[Pl [70l and[Tl\ hold, E{m{Wi,6)m{Wi,6y\Xi = x) is continuous, and the data are iid. Then 
this provides an asymptotically level a test of 6 E Qq if the subsampling procedure is used 
or if Assumption holds and the procedure based on estimating the asymptotic distribution 
directly is used. If AssumptionUl holds, this test is asymptotically exact. 

The estimates used for this pre-test can also be used to construct estimates of the quan- 
tities in Assumption |6] that satisfy the consistency requirements of this assumption. Suppose 
that we have estimates M(x), fx{x), and V{x) of E{m{Wi, 9)m(Wi, 9y\X = x), fx{x), and 
V{x) that are consistent uniformly over x in a neighborhood of Xq. Then, if we have esti- 
mates of Xq and J(/c), we can estimate the quantities in Assumption [6] using Mk{xk), fx{xk), 
and V{xk) for each Xk in the estimate of Xq, where Mk{xk) is a sparse version of M{xk) with 
elements with indices not in the estimate of J{k) set to zero. 

The estimate Xq contains infinitely many points, so it will not work for this purpose. 
Instead, define the estimate Xq of Xq and the estimate J{k) of J{k) as follows. Let a„ be as 
in Theorem [T0| and let — )■ more slowly than a„[(logn/(n/i'^-^))^/^ V K^]. Let (.j be the 
smallest number such that X^ C \J^^^^Bi,^{xj^k) for some Xj^i, . . . , x-^^. Define an equivalence 
relation ~ on the set {(j, k)\l < j < dy, ^ < k < Ij} hj (j, k) ~ (j', k') iff. there is a sequence 
(j, k) = (ji, ki), (j2, /C2), • • • , {jr, K) = if, k') such that B,^{xj^^ks) ^ ^ for s 

from 1 to r — 1. Let i be the number of equivalence classes, and, for each equivalence class, 
pick exactly one {j, k) in the equivalence class and let Xr = Xj^k for some r between 1 and i. 
Define the estimate of the set Xq to be Xq = {xi, . . . ,x^-}, and define the estimate J(r) for 
r from 1 to £ to be the set of indices j for which some (j, k) is in the same equivalence class 

3jS X ip • 
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Although these estimates of Xq, i, and . . . , J{i) require some cumbersome notation 
to define, the intuition behind them is simple. Starting with the initial estimates Xj, turn 
these sets into discrete sets of points by taking the centers of balls that contain the sets Xj 
and converge at a slower rate. This gives estimates of the points at which the conditional 
moment inequality indexed by j binds for each j, but to estimate the asymptotic distribution 
in Theorem [H we also need to determine which components, if any, of Th{6,x) bind at the 
same value of x. The procedure described above does this by testing whether the balls used 
to form the estimated contact points for each index of m{6, x) intersect across indices. 

The following theorem shows that this is a consistent estimate of the set Xq and the 
indices of the binding moments. 

Theorem 12. Suppose that Assumptions [Jl 0, 0, [73, andlTl\ hold. For the estimates Xq, 
i and J{r), I = i with probability approaching one and, for some labeling of the indices 
of Xi, . . . ,xi we have, for k from 1 to £, x^ A Xk and, with probability approaching one, 
J{k) = J{k). 

An immediate consequence of this is that this estimate of Xq can be used in combination 
with consistent estimates of E{m(Wi, 9)m(Wi, Oy\X = x), fxi^), and V{x) to form estimates 
of these functions evaluated at points in Xq that satisfy the assumptions needed for the 
procedure for estimating the asymptotic distribution described in Section HI 

Corollary 3. // the estimates Mk{x), fx{x), and V{x) are consistent uniformly over x in 
a neighborhood of Xq, then, under Assumptions\^ 0, 0, [70|, and Uli the estimates Mk{xk), 
fx{ik), o-nd Vj{xk) satisfy Assumption\^ 

7 Local Alternatives 

Consider local alternatives of the form 9n = 9q + a„ for some fixed 9q such that m{Wi, 6q) 
satisfies Assumption [1] and a„ — 0. Here, I keep the data generating process fixed and 
vary the parameter being tested. Similar ideas will apply when the parameter is fixed and 
the data generating process is changed so that the parameter approaches the identified set. 
Throughout this section, I restrict attention to the conditions in Section [3l which corresponds 
to the more general setup in Section [5] with 7 = 2. To translate the a„ rate of convergence 
to 60 to a rate of convergence for the sequence of conditional means, I make the following 
assumptions. As before, define rh{9,x) = E{m{Wi,9)\Xi = x). 
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Assumption 12. For each € Xq, m{6,x) has a derivative as a function of 6 in a neigh- 
borhood of {9Q,Xk), denoted m0{6 , x) , that is continuous as a function of {6,x) at {6o,Xk) 
and, for any neighborhood of Xk, there is a neighborhood of 60 such that Thj{6,x) is bounded 
away from zero for 6 in the given neighborhood of 60 and x outside of the given neighborhood 
of Xk for j G J{k) and for all x for j ^ J{k). 

Assumption 13. For each Xk G Xq and j E J{k), E{[mj{Wi^6) — mj{Wi,OQ)Y\Xi = x} 
converges to zero uniformly in x in some neighborhood of x^ as 6 6q. 

I also make the following assumption, which extends Assumption |2] to a neighborhood of 

Assumption 14. For some fixed Y < 00 and 6 in a some neighborhood ofO^, \m{Wi, 9)\ <Y 
with probability one. 

In the interval regression example, these conditions are satisfied as long as Assumption 
[1] holds at 6*0 and the data have finite support. These conditions are also likely to hold in 
a variety of models once Assumption [1] holds at ^o- Note that smoothness conditions are in 
terms of the conditional mean fh{9, x), rather than m{Wi, 6), so that the conditions can still 
hold when the sample moments are nonsmooth functions of 6. 

Set ttn = hnd for some sequence of scalars 6„ — )■ and a constant vector a. Going through 
the argument for Theorem [H the variance term in the local process is now 

Ti 

[En - E)m{Wi, 9o + bna)I{hnS < X - Xk < hn{s + t)) 



dx 



n 



{En - E)m{Wi, 9o)I{hnS < X - Xk < hn{s + t)) 

The first term is the variance term under the null, and the second term should be small 
under Assumption | 
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As for the drift term, 



Em{Wi, 9 + hna)I{hnS < Xi - Xk < hn{s + t)) 



1 



dx+2 



Em{Wi, e)I{hnS <Xi-Xk< hn{s + t)) 



+ -^E[m{Wi, 9 + hno) - m{Wi, 9)]I{Ks <Xi-Xk<K{s + t)). 

The first term is the drift term under the null. The second term is 
1 



;E[m{9 + 6„a, X^) - m{9, Xi)\I{Ks < Xi - Xk < K{s + t)) 



~ ■^jrf^^^n^"e(6', Xi)aI{hnS < Xi-Xk < Kis + t)) 

~ -rS^fxixk)mg{9,Xk)a j dx = ^fxixk)The{9, Xk)aY[ti. 

iln J hnS<x—Xk<hn(yS+t) '^n j 

Setting hn = h\ = n~'^^^^^^^^ gives a constant that does not change with n, so we should 
expect to have power against n~'^^^'^^^^^ alternatives. The following theorem formalizes these 
ideas. 

Theorem 13. Let 9q be such that E{m{Wi,9Q)\Xi) > almost surely and Assumptions^ 
UMIM and\l^ are satisfied for 9q. Let a E and let an = an-^/^'^^+'^\ Let Z{a) he a 
random variable defined the same way as Z in Theorem^, hut with the functions gp,xk,j{s,t) 
replaced by the functions 

gp,Xk,jAs,t) = ^fx{xk) I x'Vj{xk)xdx + mgj{9o,Xk)afx{xk)Ylti 

^ J s<x<s+t ■ 

for j G J{k) for each k where rhoj is the jth row of the derivative matrix rhg. Then 



n 



fe+2)/(rfx+4) i^fEnm{W^, 9 + a„)/(s < X, < s + t) 4 Z{a). 

s,t 

Thus, an exact test gives power against n^'^/^^x+i) alternatives (as long as Thgj(9o, Xk)a 
is negative for each j or negative enough for at least one j), but not against alternatives that 
converge strictly faster. The dependence on the dimension of Xi is a result of the curse of 
dimensionality. With a fixed amount of "smoothness," the speed at which local alternatives 
can converge to the null space and still be detected is decreasing in the dimension of X^. 
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Now consider power against local alternatives of this form, with a possibly different 
sequence a„, using the conservative estimate that ^/nm^s t Enm{Wi, 0)1 {s < Xj < s + t) A 
for 9 G ©o- That is, we fix some 77 > and reject if y/nS{mfs t Enm{Wi, Oq + an)I{s < Xi < 
s + t)) > 1]. For the drift term Emj{Wi,9Q + an)I{s < Xi — Xk < s + t) of the local alternative, 
we have, for t near zero and s near any a;^ G A'o, 

y/nErrijiWi, 9q + an)I{s < X^ — < s + t) 
PS ^/nE[mj{6o, Xi) + mej{6o, Xj)a„]/(s < Xi - Xk < s + t) 

^ y/nfxixk) / (^x'Vj{xk)x + mgj{eQ,Xk)anj dx. 

J s<x<s+t V^^ / 

For any a and b, 

fxixk) [ (\x'Vx + (^dx = {a/b)fxixk) I [\[{h/afl^x\'V[{h/af'\] + h \ dx 

J s<x<s+t / J s<x<s+t J 



{a/b)fx{xk) I {lu'Vu + b) {hla)-'^^/^du. 
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For any (s,t), the last line in the display is equal to (0/6)^'^-^+^^/^ times the first expression 
in the display evaluated at a different value of {s, t) with a replaced with b. It follows that 
the minimized expression for b is times the minimized expression for a. Thus, 

if a„ = abn, the drift term is of order \^bl^^~^'^^^'^ , so we should expect to have power 
against local alternatives with y/nbn^^'^''^'^ = 0{1) or 6„ = n~^/^'''^^'^'> (note that setting 
^{dx+'2)/(dx+i)-jji^x+2)/2 _ Q(^^'^ gQ the drift term is of the same order of magnitude as 

the exact rate of convergence gives the n"^/'-'^^'*"^-* rate derived in the previous theorem for 
the exact test). Since the infimum of the drift term is taken at a point where t is small, we 
should expect the mean zero term to converge at a faster than ^Jn rate, so that the limiting 
distribution will be degenerate. This is formalized in the following theorem. 

Theorem 14. Let 6q be such that E{m{Wi,6Q)\Xi) > almost surely and AssumptionsUl 
EH [13, andU^ are satisfied for 6q. Let a G W-o and let a„ = an~^^^'^^^'^\ Then, for each j, 



^/nmf EnmAWi, 6q + an)I{s < X < s + t) 

s,t 

A min inf/x(xfc) / ( -x'Vx +fng j{6Q,Xk)a] dx. 

k s.t. jeJik) s,t Js<x<s+t V2 ' / 

The n""'^/*-'^'^^^-* rate is slower than the n^^^*-"'-^^^-' rate for detecting local alternatives with 
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the asymptotically exact test. As with the asymptotically exact tests, the conservative tests 
do worse against this form of local alternative as the dimension of the conditioning variable 
Xi increases. 

8 Monte Carlo 

I perform a monte carlo study to examine the finite sample behavior of the tests I propose, 
and to see how well the asymptotic results in this paper describe the finite sample behavior 
of KS statistics. First, I simulate the distribution of KS statistics for various sample sizes 
under parameter values and data generating processes that satisfy Assumption [H and for 
data generating processes that lead to a ^/n rate of convergence. As predicted by Theorem 
[H for the data generating process that satisfies Assumption [H the distribution of the KS 
statistic is roughly stable across sample sizes when scaled up by 77,('^x+2)/('^-f +4)^ -poi the 
data generating process that leads to ^/n convergence, scaling by ^/n gives a distribution 
that is stable across sample sizes. Next, I examine the size and power of KS statistic based 
tests using the asymptotic distributions derived in this paper. I include procedures that 
test between the conditions leading to ^/n convergence and the faster rates derived in this 
paper using the subsampling estimates of the rate of convergence described in Section 16. 
as well as infeasible procedures that use prior knowledge of the correct rate of convergence 
to estimate the asymptotic distribution. 

8.1 Monte Carlo Designs 

Throughout this section, I consider two monte carlo designs for a mean regression model with 
missing data. In this model, the latent variable W* satisfies E{W*\Xi) = 9i + 62X1, but W* 
is unobserved, and can only be bounded by the observed variables W/^ = Wl(W* missing) + 
W*I{W* observed) and Wf" = wI(W* missing) + W*I{W* observed) are observed, where 
[w,w] is an interval known to contain W*. The identified set 60 is the set of values of (6*1, 62) 
such that the moment inequalities E{Wl^ -9i-92Xi\Xi) > and E{9i+92Xi-W^^\Xi) > 
hold with probability one. For both designs, I draw Xi from a uniform distribution on 
(—1, 1) (here, dx = !)• Conditional on Xi, I draw Ui from an independent uniform (—1, 1) 
distribution, and set W* = 9i^^ + 6*2, *Xj + Ui, where 6*1^* = and ^2,* = •!• I then set 
W* to be missing with probability p*{Xi) for some function p* that differs across designs. 
I set [w,w] = [—.1 — 1, .1 + 1] = [—1.1, 1.1], the unconditional support of W*. Note that, 
while the data are generated using a particular value of 9 in the identified set and a censoring 
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process that satisfies the missing at random assumption (that the probabihty of data missing 
conditional on (Xj, W*) does not depend on W*), the data generating process is consistent 
with forms of endogenous censoring that do not satisfy this assumption. The identified set 
contains all values of 6 for which the data generating process is consistent with the latent 
variable model for 6 and some, possibly endogenous, censoring mechanism. 

The shape of the conditional moment inequalities as a function of Xj depends on p*. 
For Design 1, I set p*{x) = (0.9481x^ + 1.0667x3 - 0.6222^^ - 0.6519x + 0.3889) A 1. The 
coefficients of this quartic polynomial were chosen to make p*{x) smooth, but somewhat 
wiggly, so that the quadratic approximation to the resulting conditional moments used in 
Theorem [U will not be good over the entire support of Xj. The resulting conditional means 
of the bounds onW* areE(l^^|Xi = x) = {1 - p* (x)) {9i,^ + e2,*x) +p*{x)w and E{W,"\Xi = 
x) = (1 — p*(x))(6'i,* + 92,*x) +p*{x)W. In the monte carlo study, I examine the distribution 
of the KS statistic for the upper inequality at {9i^di,02,di) = (1-05, .1), a parameter value 
on the boundary of the identified set for which Assumption [1] holds, along with confidence 
intervals for the intercept parameter 6i with the slope parameter 62 fixed at .1. For the 
confidence regions, I also restrict attention to the moment inequality corresponding to W^^ , 
so that the confidence regions are for the one sided model with only this conditional moment 
inequahty. Figure |3] plots the conditional means of W^^ and W^^, along with the regression 
line corresponding to 6 = (1.05, .1). The confidence intervals for the slope parameter invert 
a family of tests corresponding to values of 6 that move this regression line vertically. 

For Design 2, I set p*{x) = [{\x — .5| V .25) — .15] A .7). Figure H] plots the resulting 
conditional means. For this design, I examine the distribution of the KS statistic for the 
upper inequality at [61^02,62,02) = (1-1, -9), which leads to a positive probability contact 
set for the upper moment inequality and a ri^/^ rate of convergence to a nondegenerate 
distribution. The regression line corresponding to this parameter is plotted in Figure H] as 
well. For this design, I form confidence intervals for the slope parameter 61 with 62 fixed at 
.9, using the KS statistic for the moment inequality for W^^ . 

The confidence intervals reported in this section are computed by inverting KS statistic 
based tests on a grid of parameter values. I use a grid with meshwidth .01 that covers the 
area of the parameter space with distance to the boundary of the identified set no more than 
1. In practice, monotonicity of the KS statistic in certain parameters (in this case, the KS 
statistic for each moment inequality is monotonic in the intercept parameter) can often be 
used to get a rough estimate of the boundary of the identified set before mapping out the 
confidence region exactly. In this case, a rough estimate of the boundary of the identified 
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set for the intercept parameter could be formed by finding the point where the KS statistic 
for the moment inequahty for W^^ crosses a fixed critical value before performing the test 
with critical values estimated for each value of 6. All of the results in this section use 1000 
monte carlo draws for each sample size and monte carlo design. 

8.2 Distribution of the KS Statistic 

To examine how well Theorem [T] describes the finite sample distribution of KS statistics under 
Assumption [T], I simulate from Design 1 for a range of sample sizes and form the KS statistic 
for testing {Oi^^i, 62,01) ■ Since Assumption [1] holds for testing this value of 6 under this data 
generating process, Theorem [1] predicts that the distribution of the KS statistic scaled up 
by rS''-x+'^)/{dx+i) — j^3/5 gjiould be similar across the sample sizes. The performance of this 
asymptotic prediction in finite samples is examined in Figure O which plots histograms of 
the scaled KS statistic n^/^S{Tn{9)) for the sample sizes n G {100,500,1000,2000,5000}. 
The scaled distributions appear roughly stable across sample sizes, as predicted. 

In contrast, under Design 2, the KS statistic for testing {9i,d2-, 62,02) will converge at a 
n^/^ rate to a nondegenerate distribution. Thus, asymptotic approximation suggests that, 
in this case, scaling by n^/^ will give a distribution that is roughly stable across sample 
sizes. Figure E] plots histograms of the scaled statistic n^^'^S(Tn{9)) for this case. The scaling 
suggested by asymptotic approximations appears to give a distribution that is stable across 
sample sizes here as well. 

8.3 Finite Sample Performance of the Tests 

I now turn to the finite sample performance of confidence regions for the identified set based 
on critical values formed using the asymptotic approximations derived in this paper, along 
with possibly conservative confidence regions that use the n^/^ approximation. The critical 
values use subsampling with different assumed rates of convergence. I report results for the 
tests based on subsampling estimates of the rate of convergence described in Section [6?H tests 
that use the conservative rate n^/^, and infeasible tests that use a n^^^ rate under Design 1, 
and a n^/^ rate under Design 2. The implementation details are as follows. For the critical 
values using the conservative rate of convergence, I estimate the .9 and .95 quantiles of the 
distribution of the KS statistic at each value of 6 using subsampling, and add the correction 
factor .001 to prevent the critical value from going to zero. The critical values using estimated 
rates of convergence are computed as described in Section 16.11 I use the subsample sizes 
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bi = [n^/^l and 62 = to estimate the rate of convergence (3 for subsampling, and 

62 = 5 for the rate estimate Pa that is used to test whether the conservative rate should be 
used. For both rate estimates, I average the estimates computed using the quantiles .5, .9, 
and .95. For the upper and lower truncation points for the rate of convergence, I use /3 = .55 
and P = 2/3. These truncation points allow for exact inference for values of 6 such that 
Assumption [7] holds with 7 = 2 (twice differentiable conditional mean) or 7 = 1 (directional 
derivatives from both sides). The upper truncation point /3 corresponds to 7 = 1, and the 
lower truncation point /3 is halfway between the rate of convergence exponent 3/5 for 7 = 2, 
and the conservative rate exponent 1/2. In addition, I truncate (3 from below at 1/2 in cases 
where /3 < 1/2. For both the conservative and estimated rates of convergence, I use the 
uncentered subsampling estimate with subsample size [n^^^] . All subsampling estimates use 
1000 subsample draws. For values of 6 such that the pre-test finds that the conservative 
approximation should be used {f3a < /?), I use the same method of estimating the critical 
values as in the tests that always use the conservative rate of convergence. 

Table [1] reports the coverage probabilities for {Oi di, 62^01) under Design 1. As discussed 
above, under Design 1, (6'i /ji, 62^01) is on the boundary of the identified set and satisfies As- 
sumption [H As predicted, the tests that subsample with the n^^"^ rate are conservative. The 
nominal 95% confidence regions that use the n^/^ rate cover {Oi ^i, 62,01) with probability at 
least .99 for all of the sample sizes. Subsampling with the exact n^^^ rate of convergence, an 
infeasible procedure that uses prior knowledge that Assumption [1] holds under (6*1,1)1, 6*2,^1) 
for this data generating process, gives confidence regions that cover {9i,di,62,di) with prob- 
ability much closer to the nominal coverage. The subsampling tests with the estimated rate 
of convergence also perform well, attaining close to the nominal coverage. 

Table |2] reports coverage probabilities for testing (6*1, ^12, 62,02) under Design 2. In this 
case, subsampling with a ra^/^ rate gives an asymptotically exact test of {61,02,62,02), so we 
should expect the coverage probabilities for the tests that use the n^/^ rate of convergence to 
be close to the nominal coverage probabilities, rather than being conservative. The coverage 
probabilities for the ra^/^ rate are generally less conservative here than for Design 1, as the 
asymptotic approximations predict, although the coverage is considerably greater than the 
nominal coverage, even with 5000 observations. In this case, the infeasible procedure is 
identical to the conservative test, since the exact rate of convergence is ra^/^. The confidence 
regions that use subsampling with the estimated rate contain {61,02, 62,02) with probability 
close to the nominal coverage, but are generally more liberal than their nominal level. 

Given that subsampling with the estimated rate increases type I error by having coverage 
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probability close to the nominal coverage probability rather than being conservative, we 
should expect a decrease in type II error. The results in Section [7] show that critical values 
based on the exact n^^^ rate of convergence lead to tests that detect local alternatives that 
approach the identified set at a n'^/^'^x+^) = r?!^ rate, while the conservative tests detect local 
alternatives that approach the identified set at a slower vi}!^'^^^'^^ = n^^^ rate. For confidence 
regions that invert these tests, this is refiected in the portion of the parameter space the 
confidence region covers outside of the true identified set. 

Tables [3] and H] summarize the portion of the parameter space outside of the identified 
set covered by confidence intervals for the intercept parameter 6i with 62 fixed at ^2,di for 
Design 1 and 62^02 for Design 2. The entries in each table report the upper endpoint of one of 
the confidence regions minus the upper endpoint of the identified set for the slope parameter, 
averaged over the monte carlo draws. As discussed above, the true upper endpoint of the 
identified set for 9i under Design 1 with 62 fixed at 6*2,01 is Oi,di, and the true upper endpoint 
of the identified set for 61 under Design 2 with 62 fixed at 62,02 is 61, d2, so, letting Ui-a be 
the greatest value of 61 such that (6*1, 6*2,01) is not rejected. Table [3] reports averages of 
^i-« — ^2,Di; and similarly for Table H] and Design 2. 

The results of Section [7] suggest that, for the results for Design 1 reported in Table El 
the difference between the upper endpoint of the confidence region and the upper endpoint 
of the identified set should decrease at a n^/^ rate for the critical values that use or estimate 
the exact rate of convergence (the first and third rows), and a n^^^ rate for subsampling with 
the conservative rate and adding .001 to the critical value (the second row). This appears 
roughly consistent with the values reported in these tables. The conservative confidence 
regions start out slightly larger, and then converge more slowly. For Design 2, the KS statistic 
converges at a n^^^ rate on the bound ary of the identified set for 9i for 62 fixed at 62,02, and 
arguments in [Andrews and Shil (20091) show that n^^"^ approximation to the KS statistic give 
power against sequences of alternatives that approach the identified set at a n^/^ rate. The 
confidence regions do appear to shrink to the identified set at approximately this rate over 
most sample sizes, although the decrease in the width of the confidence region is larger 
than predicted for smaller sample sizes, perhaps refiecting time taken by the subsampling 
procedures to find the binding moments. 
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9 Illustrative Empirical Application 



As an illustrative empirical application, I apply the methods in this paper to regressions 
of out of pocket prescription drug spending on income using data from the Health and 
Retirement Study (HRS). In this survey, respondents who did not report point values for 
these and other variables were asked whether the variables were within a series of brackets, 
giving point values for some observations and intervals of different sizes for others. The 
income variable used here is taken from the RAND contribution to the HRS, which adds 
up reported income from different sources elicited in the original survey. For illustrative 
purposes, I focus on the subset of respondents who report point values for income, so that 
only prescription drug spending, the dependent variable, is interval valued. The resulting 
confidence regions are valid under any potentially endogenous process governing the size of 
the reported interval for prescription expenditures, but require that income be missing or 
interval reported at random. M ethods similar to those pr oposed in this paper could also 
be used along with the results of iManski and Tamer! (l2002l ) for interval reported covariates 
to use these additional observations to potentially gain identifying power (but still using an 
assumption of exogenous interval reporting for income). I use the 1996 wave of the survey 
and restrict attention to women with no more than $15,000 of yearly income who report 
using prescription medications. This results in a data set with 636 observations. Of these 
observations, 54 have prescription expenditures reported as an interval of nonzero width with 
finite endpoints, and an additional 7 have no information on prescription expenditures. 

To describe the setup formally, let Xj and W* be income and prescription drug expendi- 
tures for the ith observation. We observe (Xj, W^^, Wf), where [W^^, W^^] is an interval that 
contains W*. For observations where no interval is reported for prescription drug spending, I 
set Wf" = and Wf^ = oo. I estimate an interval median regression model where the median 
(li/2(W*\Xi) of W* given is assumed to follow a linear regression model qi/2{W*\Xi) = 
6i + This leads to the conditional moment inequalities E{m{Wi,d)\Xi) > almost 

surely, where m{Wi,e) = {I{6i + ^2^^ < W}^) - 1/2,1/2 - J(^i + ^2^^ < Wt)) and 
Wi = {X,,Wt.W!'). 

Figure [7| shows the data graphically. The horizontal axis measures income, while the 
vertical axis measures out of pocket prescription drug expenditures. Observations for which 
prescription expenditures are reported as a point value are plotted as points. For obser- 
vations where a nontrivial interval is reported, a plus symbol marks the upper endpoint, 
and an x marks the lower endpoint. For observations where no information on prescription 
expenditures is obtained in the survey, a circle is placed on the x axis at the value of income 
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reported for that observation. In order to show in detail the ranges of spending that contain 
most of the observations, the vertical axis is truncated at $15,000, leading to 5 observations 
not being shown (although these observations are used in forming the confidence regions 
reported below). 

I form 95% confidence intervals by inverting level .05 tests using the KS statistics de- 
scribed in this paper with critical values calculated using the conservative rate of convergence 
n^/^, and rates of convergence estimated using the methods described in Section I^TTl For the 
function S, I set S{t) = max^ A 0|. The rest of the implementation details are the same 
as for the monte carlos in Section |8l 

For comparison, I also compute p oint estimates and confiden ce regions using the least 
absolute deviations (LAD) estimator ( iKoenker and Bassettl . Il978h for the median regression 
model with only the observations for which a point value for spending was reported. These 
are valid under the additional assumption that the decision to report an interval or missing 
value is independent of spending conditional on income. The confidence regions use Wald 
tests based on the asymptotic variance esti mates computed by Stata. These asymptotic 



variance estimates are based on formulas in 



Koenker and Bassett 



and require addi- 



tional assumptions on the data generating process, but I use these rather than more robust 
standard errors in order to provide a comparison to an alternative procedure using default 
options in a standard statistical package. 

Figure M plots the outline of the 95% confidence region for 9 using the pre-tests and 
rate of convergence estimates described above, while Figure [9] plots the outline of the 95% 
confidence region using the conservative approximation. Figure [10] plots the outline of the 
95% confidence region from estimating a median regression model on the subset of the 
data with point values reported for spending. Table |5] reports the corresponding confidence 
intervals for the components of 6. For the confidence regions based on KS tests, I use the 
projections of the confidence region for 6 onto each component. For the confidence regions 
based on median regression with point observations, the 95% confidence regions use the 
limiting normal approximation for each component of 6 separately. 

The results show a sizeable increase in statistical power from using the estimated rates of 
convergence. With the conservative tests, the 95% confidence region estimates that a $1,000 
increase in income is associated with at least a $3 increase in out of pocket prescription 
spending at the median. With the tests that use the estimated rates of convergence, the 
95% confidence region bounds the increase in out of pocket prescription spending associated 
with a $1,000 increase in income from below by $11.30. 
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The 95% confidence region based on median regression using observations reported as 
points overlaps with both moment inequality based confidence regions, but gives a different 
picture of which parameter values can be ruled out by the data. The upper bound for 
the increase in spending associated with a $1,000 increase in income is $24.40 using LAD, 
compared to $37.20 and $34.70 using KS statistics with all observations and the conservative 
and estimated rates respectively. The corresponding lower bound is $10 using LAD with 
point observations, substantially larger than the lower bound of $3 using the conservative 
procedure, but actually smaller than the $11.30 lower bound under the estimated rate. Thus, 
while the interval reporting at random assumption for the dependent variable allows one to 
tighten the upper bound for the slope parameter, a lower bound close to the lower bound 
of the LAD confidence interval can be obtained using the new asymptotic approximations 
developed in this paper. 

Note also that these tests could, but do not, provide evidence against the assumptions 
required for LAD on the point reported values. If the LAD 95% confidence region did 
not overlap with one of the moment inequality 95% confidence regions, there would be no 
parameter value consistent with this assumption at the .1 level (for any parameter value, 
we can reject the joint null of both models holding using Bonferroni's inequality and the 
results of the .05 level tests). This type of test will not necessarily have power if the interval 
reporting at random assumption for the dependent variable does not hold, so it should not 
be taken as evidence that the more robust interval regression assumptions can be replaced 
with LAD methods. 

10 Discussion 

Under some smoothness conditions, the asymptotic approximations derived in Sections [3] and 
[5] can be combined with the methods in Sections [Hand [6] to form tests that are asymptotically 
exact on portions of the boundary of the identified set where the ^Jn approximation only 
allows for conservative inference. Since these methods require assumptions on the conditional 
mean that are not needed for conservative inference using the -y/n approximation, the decision 
of which method to use involves a tradeoff between power and robustness. The results in 
Section [7] quantify these tradeoffs. While approximations to the distribution of a KS statistic 
based on the asymptotic distribution in Section |3] and the tests in Sections H] and [6] may not 
be robust to certain types of nonsmooth conditional means, when they are valid, they can 
detect parameters in a n"'^!'^'^^^'^^ region of the identified set, while the ^Jn approximation 
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can only detect parameters in a n~^^^^^~^'^^ region of the identified set. It should be noted 
that, even if the pre-tests in Section |6] find a rate of convergence that is too fast, Lemma [6] 
in the Appendix shows that the rate of convergence will typically be within log n of 1/ri for 
testing 6 on the interior of the identified set, so the resulting confidence region, while failing 
to contain values of 6 near the boundary of the identified set with high probability, will not 
be too much smaller than the true identified set. 

The results in this paper also shed light on the tradeoff between the KS statistics based 
on integrating conditional moments to get unconditional moments considered in this paper 
and other methods for inferen ce with conditional moment inequalities, such as those ba sed 



2009 



Ponomareva . 



2010|) or 



on kernel or series estimation (iChernozhukov. Lee, and Rosenl . 
increasing numbers of unconditional moments (IMenzell . 120081 ). With the bandwidth chosen 
to decrease at the correct rate, kernel methods based on a supremum statistic will give close 
to the same n~'^^^'^^^^'^ rate (up to a power of logn) for detecting the local alternatives 
considered in this paper. With enough derivatives imposed on the conditional mean, higher 
order kernels or series methods could be used to get even more power. However, kernel based 
methods will perform worse with suboptimal bandwidth choices, or against local alternatives 
in which the conditional moment inequality fails to hold on a larger set. The n~'^^^'^^~^'^^ rate 
for detecting local alternatives can also be achi eved within a log n term using the increasing 
truncation point variance weighting proposed in lArmstrongI (120111 ). Unlike the tests proposed 
in this paper, those methods are robust to nonsmooth conditional means. These tests also 
have the advantage of adapting to different shapes of the conditional mean without estimating 
the optimal bandwidth, as would be necessary with kernel estimates, or estimating the rate 
of convergence of a test statistic, as required by the tests in this paper. However, they have 
less power by a log n term when applied to this setting, and require choosing a conservative 
critical value, which decreases the power further (but not the rate at which local alternatives 
can converge to the identified set and still be detected). 

While the results in this paper and [Armstrong! ( 120111 ) characterize how moment selection 
and weighting functions affect relative efficiency in this setting, the choice of test statistic 
(supremum norm, as considered here, or Lp norm, as with Cramer-von Mises sta tistics) and 



i nstru ment functions are also of interest. While the results in this paper and in [Armstrong 
( 120111 ) give some insight into these problems (for example, it is clear from the arguments 
in these papers that Cramer-von Mises style statistics will have less power in this setting 
unless new asymptotic distribution results or moment selection procedures are used) more 
complete answers to these questions are topics of ongoing research. 
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It is also interesting to compare the nonsimilarity problem with the statistics in this paper 
to nonsimilarity problems encountered with kernel based methods. The rate of convergence 
of supremum statistics based on kerne l estimates of the co nditional moments also depends on 
the contact set, but to a lesser extent. Ponomareval (j2010 ) shows that the rate of convergence 
of these stat istics differs by a factor of logra depending on the contact set. Arguing as in 
Section 6 of lArmstrong] (120111 ). this would lead to an increase in the rate at which local 



alternatives can approach the identified set and still be detected by a factor of logn. In 
contrast, the polynomial difference in the rates of convergence of the KS statistics based 
on integrated moments considered in the present paper leads to increases in local power 
by factors of n rather than logn. Thus, the gains in terms of power from using exact 
approximations are much larger in this context. 

In addition to these immediate practical applications, the results in this paper are also 
of independent interest in their relation to broader questions in the literatures on moment 
inequalities and nonparametric estimation. In testing multiple moment inequalities, the 
asymptotic distribution of test statistics typically only depends on inequalities that bind as 
equalities. Since the non binding moments do affect the finite sample distribution of the 
test statistic, this means that asymptotic distributions may provide poor approximations 
to finite sample distributions. The existing literature on moment inequalities has taken 
several approaches to this issue. One is to use conservative approximations using "least 
favorable" asymptotic distributions in which all moment inequalities bind. Another approach 
is to design tests that are robust to sequences where the data gener ating process or test 
statistic changes as the number of observations increases. iMenzell (120081 ) considers asymptotic 
approximations in which the num ber of moment inequaliti es used for a test statistic increases 
with the number of observations. lAndrews and Shil (120091 ) show that the tests they consider 
using test statistics similar to the ones in this paper, but using a (possibly degenerate) y/n 



asymptotic distribution, have the correct size asymptotically when data generating processes 
change with the sample size within certain classes of data generating processes. Since these 
classes of data generating processes include sequences where some moment inequalities are 
slack, but close to binding, this suggests that the methods they propose will not suffer from 
problems with non binding inequalities affecting the finite sample distribution. 

In contrast, the asymptotic distributions presented in Sections [3] and [5] of the present 
paper are, to my knowledge, the first known case of the asymptotic distribution of a test 
statistic that uses a fixed (although, in this case, infinite) set of moment inequalities depend- 
ing on moment inequalities that do not bind. These results show that, under the conditions 
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in this paper, the "moment selection" problem takes the form of a balancing of expected value 
and variance of moments that are close to binding. This leads to ideas typically associated 
with kernel smoothing and nonstandard M-estimation applying to test statistics for moment 
inequalities. As with t he objective functions for nonstandard M-estimation considered by 



Kim and PoUardl (Il990[ ). the asymptotic distribution of the KS statistic is the limit of local 



processes under a scaling that balances a drift term and a variance term. This balancing 
of drift and variance terms mirrors the equating of bias and variance terr as in choosing the 
optim al bandwidth for nonparametric kernel estimation (see, for example. iPagan and UUah . 



19991 ). This is especially interesting since one of the appealing features of KS style statistics 



in this setting is that they get rid of the need for bandwidth parameters. In the settings I 
consider, the choice of "bandwidth" is made automatically by the balancing of the drift and 
variance terms, which determines the scale of the moments that matter asymptotically. How- 
ever, this shows up in the rate of convergence, so that tests to determine which "bandwidth" 
was chosen are still needed for exact inference. Thus, in a sense, the bandwidth selection 
problem shows up in the moment selection problem through the rate of convergence. 

In another paper (Armstrong, 2011 ). I show that KS statistics similar to the ones in the 
present paper can be made to choose the moments that correspond to the optimal bandwidth 
by using a variance weighting with an increasing sequence of truncation points. This helps 
alleviate the problem with different rates of convergence of the KS statistic along the bound- 
ary of the identified set, but loses a logn term relative to the tests based on unweighted KS 
statistics (or KS statistics with bounded weights) and asymptotic approximations based on 
the exact rate of convergence. Thus, moment selection (in the form of testing for rates of 
convergence) and vari ance weight i ng pla y similar roles in this framework. Even without the 
variance weighting of ArmstrongI ( 2011 ). the statistics in this paper find the moments that 
lead to the most local power. Estimating the rate of convergence of the test statistic is only 
needed to find the order of magnitude (under the null) of the moments that were found. 

The results in this paper are pointwise in the underlying distribution P. Since the pro- 
cedures proposed in this paper involve pre tests, it is natural to ask for which classes of 
underlying distributions these tests are uniformly valid. Since uniformity in the underlying 
distribution is implicit in the bounds used in many of the arguments used to derive these 
asymptotic distributions, it seems likely that these tests could be shown to enjoy uniformity 
in classes of distributions with uniform bounds on the constants governing the smoothness 
conditions needed for the pointwise results. While this would be an interesting extension 
of the results in the paper, uniformity in the underlying distribution is perhaps less inter- 
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esting than in other settings because many of the tradeoffs between the approach in the 
present paper and more conservative approaches are aheady clear from the pointwise re- 
sults. Smoothness conditions not needed for the conservative approach to control the size 
uniformly in the underlying distribution are needed even for the pointwise results derived 
here. Thus, it is clear from the pointwise results that the power improvement achieved by 
the tests in this paper comes at a cost of robustness to smoothness conditions. 

Many of the results in this paper assume that the conditional mean rh{6, x) is minimized 
only on a finite set. For the case where dx = ^, this is implied by smoothness conditions 
on the conditional mean (or, when it does not hold, the results in this paper bound the 
rate of convergence so that the tests based on estimated rates are still valid). In higher 
dimensions, the case where the contact set has infinitely many points but is of a dimension 
less than dx is likely to be more difficult, but similar ideas will apply. The results in 
this paper could also be extended to the case where the Th{6,x) only approaches near the 
(possibly infinite) boundary of the support of x. These cases are often relevant in per forming 



inference on bounds on treatment effects such as those considered by iManskil (119901 ). In the 
one dimensional case, Xi can be transformed into a uniform random variable so that the 
conditions on the density of the conditioning variable used in this paper will apply (once the 
density is positive and well behaved on its support, the assumption that the contact point is 
on the interior of the support is easy to relax). If the density and conditional mean approach 
zero at polynomial rates, the transformed model will fit into a slight extension of Theorem [6] 
for some 7 that depends on these rates. These transformations are used in a slightly different 
setting in lArmstrongl (120111 ) . 



11 Conclusion 



This paper derives the asymptotic distribution of a class of Kolmogorov-Smirnov style test 
statistics for conditional moment inequality models under a general set of conditions. I show 
how to use these results to form valid tests that are more powerful than existing approaches 
based on this statistic. Local power results for the new tests and existing tests are derived, 
which quantify this power improvement. While the increase i n power comes at a cost of 
robustness to smoothness conditions, a complementary paper (lArmstrongl . l201l[ ) proposes 
methods for inference that achieve almost the same power improvement while still being 
robust to failure of smoothness conditions. 

In addition to their immediate practical application to asymptotically exact inference. 
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the results in this paper add to our understanding of how famihar issues in the hteratures on 
moment inequahties and nonparametric estimation, such as moment selection and the curse 
of dimensionality, manifest themselves in the use of one sided KS statistics for conditional 
moment inequalities. Under the conditions in this paper, the asymptotic distribution of the 
KS statistic depends on nonbinding moments, which are determined through a balancing of 
a bias term and a variance ter m in a way that i s simil ar to the objective functions for the 
point estimators considered by iKim and PoUardl (jl990l ). The dimension of the conditioning 
variables and the smoothness of the conditional mean determine which moments matter 
asymptotically and which types of local alternatives the KS statistic can detect. 



Appendix 

This appendix contains proofs of the theorems in this paper. The proofs are organized 
into subsections according to the section containing the theorem in the body of the pa- 
per. In cases where a result follows immediately from other theorems or arguments in 
the body of the paper, I omit a separate proof. Statements involving convergence in dis- 
tribution in which random elements in the converging sequence are not measurable with 
respect to the relevant Borel sigm a algebra are in the sense of outer weak convergence (see 



van der Vaart and Wellner 
appendix. 



19961 ). For notational convenience, I use d = dx throughout this 



Asymptotic Distribution of the KS Statistic 

In this subsection of the appendix, I prove Theorem [H For notational convenience, let 
Yi = m{Wi,9) and Fj j(m) = mj(^rn){Wi,9) and let d = dx and k = dy throughout this 
subsection. 

The asymptotic distribution comes from the behavior of the objective function EnYijI{s < 
Xi < s + t) for {s,t) near Xm such that j G J{m). The bulk of the proof involves show- 
ing that the objective function doesn't matter for (s,t) outside of neighborhoods of Xm with 
j G J(m) where these neighborhoods shrink at a fast enough rate. First, I derive the limiting 
distribution over such shrinking neighborhoods and the rate at which they shrink. 

Theorem 15. Let hn = n''^ for some Q < a <l/d. Let 

'^n,x^{s,t) = ^^{En - E)Yi^jl^rn)I{Ks < Xi - Xm < K{s + t)) 
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and let Qn^xmi^yt) have jth element 

9n,Xm,j{s,t) = j^EYijI{hnS < Xi-Xm< K{s + t)) 

if j G J(m) and zero otherwise. Then, for any finite M, (G„^a;^(s, t), . . . , G„^a;^(s, t)) -4 
(Gp,^i(s, t), . . . , Gp,2^^(s, t)) taken as random processes on < M with the supremum 

norm and gn,x,^{s,t) gp,xm{s,t) uniformly in < M where Gp,xmis,t) and g p^r,^{s , t) 

are defined as in TheoremUlfor m from 1 to i. 



Proof. The convergence in dist ribution in the first statement fol 
ditions of Theorem 2.11.22 in 



van der Vaart and Wellnerl ( 119961 ). To derive the covariance 



ows from verifying the con- 



kerneL note that 



COv{Gn,x,^{s,t),Gn,x,r.{s' ,t')) 
= h-''EYi^j(^)Ylj(^^)I {hn{s y S')<X- Xrn < [{s + t) A {s' + t')]} 

- h-'^ {EY,j^„,)I [hnS <X-x^< hn{s + t)]] [EYljf^^^I [hj <X-Xm< h^{s' + t')]} . 

The second term goes to zero as n — )■ oo. The first is equal to the claimed covariance kernel 
plus the error term 



hJ / [E(Fi,j(„)F/j(„)|X = x)fxix) - E{Yij^rn)Ylj^^)\X = x„)/x(a 

Jh„{sVs')<x-Xm<h„[{s+t)A{s'+t')] 

which is bounded by 

\ n "^f^^. ,AE{Yi,J(m)Ylj^^-^\X = X)fx{x) - = X^)fx{Xm)] 

\^\\x-Xm\\<2hnM 

X /i^"' / dx 

Jh„{sVs')<x-Xm<h„[(s+t)A{s'+t')] 
= 1 II "^f^^o. A. [EO^i,J{m)Kj{m)\^ = ^)fx{x) - E(Yij^m)Ylj(^^)\X = 



X / dx. 

' {sVs')<X-X,n<{s+t)A{s'+t') 



This goes to zero as n — )• oo by continuity of i?(Fj,j(m)^/j(m)l-^ = ^) fx{x). For m 7^ 
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and \\is,t)\\ < M, \\{s',t')\\ < M, cov{Gn,a^^{s,t),Gn,xAs' ,t')) is eventually equal to 

-h-"^ {EYi^j^rn)! [hnS < X - < K{s + t)]] {EYIj^^^I [hj <X-Xr< K{s' + t')]] , 

which goes to zero, so the processes for different elements of Xq are independent as claimed. 

For the claim regarding g'„^a.^(s, t), first note that the assumptions imply that, for j e 
J{m), the first derivative of x i-^ E{Yij\X — x) at 0, and that this function has a 

second order Taylor expansion: 

E{Yij\X = x) = ^{X- Xm)'Vj{Xm){x - Xm) + Rn{x) 

where 

Rn{x) = ^{X - Xm)'Vj{x*{x)){x - Xm) " ^{x - X^) Vj(x^) (x - Xm) 

and Vj{x*) is the second derivative matrix evaluated at some x*{x) between Xm and x. 
We have 

gn,xm,j{s, t) = / (X- Xm)'Vj{Xm){x - Xm)fx{Xm) dx 

^'^n J h„S<X—Xm<hn{s+t) 

+ [ ~ Xm)'Vj (Xm) {x - Xm) [fx (x) - fx {Xm)] dx 

^'^n J hnS<X-Xm<h„{s+t) 

+ I Rn{x)fx{x)dx. 

""n J hnS<X—Xm.<hn(s+t) 

The first term is equal to gp.x^ji^^ ^) by a change of variable x to hnX + Xm in the integral. 
The second term is bounded by gp,x^,j{s,t) f^.nY>\\^_^^\\<2h^M[fx{x) - fx{xm)]/ fx{xm), which 
goes to zero uniformly in || (s, t) || < M by continuity of fx- The third term is equal to (using 
the same change of variables) 

i / [x'Vj{x*{hnX + Xm))x- x'Vj{Xm)x\fx{hnX + Xm)dx. 

^ Js<x<s+t 

This is bounded by a constant times supy^n^^ \x'Vj{x* {hnX + Xm))x — x'Vj{xm)x\, which goes 
to zero as n ^ oo by continuity of the second derivatives. □ 

Thus, if we let /i„ be such that \^/hn^ — <^=^ /i„ = j7,-i/(<i+4) and scale up by 
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^(d+2)/(d+4) (^E^Y,,j^,)I{hnS <X-Xi< K{S + t)), . . . , EnY,^j(,)I{Ks < X - X, < K{s + t)) 
= {'Gn,xi {S,t) + gn,xi {S,t),..., G„,^^ {s,t) + 

^ {Gp,xi{s,t) + gp^^,{s,t), . . . ,Gp^^^{s,t) + gp^^^{s,t)) 

taken as stochastic processes in < M} with the supremum norm. From now on, let 

hn = n^^/^'^^^^ so that this will hold. 

We would like to show that the infimum of these stochastic processes over all of M^*^ 
converges to the infimum of the limiting process over all of M^"', but this does not follow 
immediately since we only have uniform convergence on compact sets. Another way of 
thinking about this problem is that convergence in distribution in < M} with the 

supremum norm for any M implies conve rgence in distribu t ion in M^°' with the topology of 



uniform convergence on compact sets (see lKim and PoUardl . Il990l ). but the infimum over all 
of M^*^ is not a continuous mapping on this space since uniform convergence on all compact 
sets does not imply convergence of the infimum over all of R^*^. To get the desired result, the 
following lemma will be useful. The idea is to show that values of (s, t) far away from zero 
won't matter for the limiting distribution, and then use convergence for fixed compact sets. 

Lemma 2. Let EI„ and M.p be random functions from M^^^ to R'^^ such that, (i) for all M , 



EI„ -4 Hp when EI„ and Hp are taken as random processes on {t G R'^i|||t|| < M} with the 
supremum norm, (ii) for all r < 0, e > 0, there exists an M such that P (inf ^\t^l^M^p,j(t) some < 
e and an N such that P (ini\\t\\>M^n,jit) some j) < e for alln > N and (Hi) mitMnj{t) < 
and mitM.pj{t) < with probability one. Then inf^^^k-^ -4 inf^gjjfei M.p(t). 

Proof. First, by the Cramer- Wold device, it suffices to show that, for all w G R^^, w' inf^^^ki H„(t) -4 
w' infjgijfci Hp(t). For this, it suffices to show that for all r G R, lim inf„ P [w' inf^g^fci H„(t) < r) > 
P (w' inf jgjgfci Hp(t) < r) and lim sup„ P (w' inf ^gjgfci H„(t) < r) < P {w' inf^^^^k^ Hp(t) < r) 
since, arguing along the lines of the Portmanteau Lemma, when r is a continuity point of 
the limiting distribution, we will have 

P iw' inf Hp(t) < r ) = P ( w' inf Hp(t) < r ) < liminf P ( w' inf H„(t) < r ) 
< liminf P I w' inf H„(t) < r ) < limsupP ( w' inf H„(t) < r ) < P Iw' inf Hp(t) < r ) . 



Given £ > 0, let M and be as in the assumptions of the lemma, but with r replaced 
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by r/(A;2 maxj l^jl). Then 

P Iw' inf Upit) <r \ <P ( (fcsmax \wi\) inf Up At) < r some 7 | < e 

V ii*ii>^ / V ^ ii*ii>-^ / 

so that P (w'mf||t||<MHp(i) < r) + £ > P (w'infjg]gfci M.p{t) < r) and, for n> N, 
P Iw' inf H„m < r 1 < P ( (A;2 max IwJ) inf e„ .(t) < r some 7 ) < e 

\ \\t\\>M J \ i \\t\\>M J 

SO that P (u''inf||t||<M IHI„(t) < r) + e > P (w'inftgK EI„(t) < r). Thus, by convergence in 
distribution of the infima over \\t\\ < M, 

hminf P I w' inf UJt) < r] > hminf P I w' inf mJt) < r ) > P iw' inf Mpit) < r 
> P ( ?i/ inf B.p{t) <r] -e 



and 



hmsupP Iw' inf H„(i) < r ) < hmsupP ( w' inf H„(i) < r] + e 

n V ^eR*^! J n V 11*11^^ / 



< P w' inf Wpit) < r + £ < P w' inf Hp(i) < r + £. 
Since e was arbitrary, this gives the desired result. 

□ 

Technically, this lemma does not apply to 

9n,xi 

{S,t),..., Gn,xe{s, t) + gn,xe{s, t)) 

since, for m r, G„,2;„(s,t) + gn,xm{^i^) evaluated at some increasing values of may 
actually be equal to 'Gin,xr{s' t^') + gn,xr{s' ,t') for some small values of {s',t'), since, once the 
local indices are large enough, the original indices overlap. Instead, noting that, for any 
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ri>0, 



2)/(rf+4) inf E„y./(s <Xi<s + t) 

s,t 



n 



m s.t. lGJ{m) \\(s,t)\\<rj/hn 

min inf G„,^„,fc(s, t) + 5f„,^„,fc(s, 

m s.t. kaJim) \\{s ,t)\\<ri / hn 

A inf EnY,,J{s < X, < s + t), . . . , 

Y Xm,t)||>r? all m s.t. 1 e J(m) 

(d+2)/(d+4) EnY,^kI{s <Xi<S + t)\ 

\\(s—Xm,t)\\>V all m s.t. fc £ J(m) ' y 

I show that, for some ^7 > 0, Z„ 2 using a separate argument, and use Lemma [2] to show 
that, for the same r/, 

(inf (s, t) + (s, t)]I{\\ (s, t) II < r,/K), inf [G„,,,(s, t) + gn,x,{s, t)]I{\\ {s, t) || < r///ij) 

s,t s,t 

4- {miGp^r,^{s,t) + gp^^j^{s,t), . . . ,miGp^r,^{s,t) + gp^:^^{s,t)), 

s,t s,t 

from which it follows that Z„ i A Z for Z defined as in Theorem [1] by the continuous 
mapping theorem. 

Part (i) of Lemma [2] follows from Theorem [T5] (the /(||(s,t)|| < rj/hn) term does not 
change this, since it is equal to one for ||(s,t)|| < M eventually). Part (iii) follows since the 
processes involved are equal to zero when t = 0. To verify part (ii), first note that it suffices 
to verify part (ii) of the lemma for Gn,xm,j{s,t) + gn,xmd{s,t) and Gp,^„j(s,t) + gp^x^A^^'^) 
for each m and j individually. Part (ii) of the lemma holds trivially for m and j such that 
j ^ J{itl), so we need to verify this part of the lemma for m and j such that j G J{m). 

The next two lemmas provide bounds that will be used to verify condition (ii) of Lemma [2] 
for Gn,xm,jis, t) +gn,x^,j{s, t) and Gp,^.„ j(s, t) + gp^^^j{s, t) for m and j with j e J (m). To do 
this, the bounds in the lemmas are applied to sets of (s,t) with ||(g,t)|| i ncrea sing. The idea 
is similar to the "peeling" argument of, for example, iKim and PoUardl (Il990l ). but different 
arguments are required to deal with values of (s, t) for which, even though ||s|| is large, Yl^ ti 
is small so that the objective function on average uses only a few observations, which may 
happen to be negative. To get bounds on the suprema of the limiting and finite sample 
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processes where t may be small relative to s, the next two lemmas bound the supremum by 
a maximum over s in a finite grid of suprema over t with s fixed, and then use exponential 
bounds on suprema of the processes with fixed s. 

Lemma 3. Fix m and j with j G J{m). For some C > that depends only on d, fx{xm) 
and E{Y^j\X — Xm), we have, for any B >1, e > Q, w > 

sup \<G,p,.m,jM\>w\ <2{3S[S7(£Al)]+2}''expf-C— ) 

for ^ greater than some constant that depends only on d, fx{xm) cind E{Y^^-\X = Xm)- 
Proof. Let G(s, t) — '^p,xj„,j{^-i We have, for any Sq < s < s -\- 1 < Sq -\- tQ, 

t) = G(so, t + s — sq) 

+ E (-^y E 

^<j<d l<h<i2<—<ij<d 
^(^0) (^1 + •Si ~ So,!) ■ ■ ■ ) ^ii-l + — 50,11-1) ^ii ~ •So.ii) + ^ii+l ~ 'So,n+l) 

■ ■ ■ , Uj-l + ^ij-l ~ •So.ij-l, Si. — So^i.,ti.+i + Sij+i — So,ij+l, . . . ,td + Sd — So^d))- 

Thus, since there are 2*^ terms in the above display, each with absolute value bounded by 
SUPt<to |G(so,^)|, 

sup \G{s,t)\ < 2'^sup|G(so,t)| = 2"'sup|G(0,t)|. 

so<s<s+t<so+to t<to t<to 

Let A be a grid of mcshwidth {eAl)/B'^ covering [—5, 2BY. For any (s, t) with || (s, t) \\ < 
B and Yl- U ^ ^i there are Sq and to with sq, Sq + Iq E A such that SQ<s<s + t<SQ + to, 
and n.to. < mU + isA l)/5^) = E?=o[(^ A l)/^^^' E/e{i,...,4,l/l=d-, R.^/ < R.^. + 
Ej=i [(^ A (/^.) 5^^-^' < 5 + £ ^''^^ id-j) ^ 2"'£. For this So, to, we will then 

have, by the above display, |G(s,t)| < 2°'supj<jg |G(so,i)|. 

This gives 

sup \G{s,t)\ < 2^^' max sup |G(so, 

||(s,t)||<B,ni ti<e so,so+toeA,Yli to,i<2<'e t<to 
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so that 



pf sup \G{s,t)\ > w] < \A\'^ max P ( 2'^sup |G(so,t)| > ) 

V II (s,t) II <B,n,:ti<e / so,so+toeA,n,to,i<2'*e \ t<to J 



\A\^ max P 2'*sup 

so,so+toeA,Yl^to^^<2d■e \ t<l 



<\A\'P{snp\GiO,t)\>^;^^^j. 



The result then follows using the fact that \A\ < {3B[B'^/{e A 1)] + 2} and usiner Theorem 



2.1 (p. 43) i n lAdler (Il990l) to bound the probability in the last line of the display (the 
theorem in kdlej ( 119901 ) shows that the probability in the above display is bounded by 
2 exp(—Kiw'^/e+K2w/e^^'^+K^) for some constants Ki, K2, and K3 with Ki > that depend 
only on d, fx{xm) and E(Y^j\X = Xm) and this expression is less than 2exp{—{Ki/2)w'^/e) 
for w'^/e greater than some constant that depends only on Ki, K2, and K3). 

□ 

Lemma 4. Fixm andj withj G J{m). For some C > that depends only on the distribution 
of (X,Y) and some r] > 0, we have, for any 1 < B < h^^rj, w > and e > ■n,~^/('^"'"^)(l + 
logn)^ 



P 



( sup |G„,.„j(3,t)|>^| <2{3P[PV(£Al)] + 2}'%xp(-C^). 



Proof. Let G„(s,t) = Gn,xm,j{s,t). By the same argument as in the previous lemma with G 
replaced by G„, we have 

sup |G„(s,t)| < 2^sup|G,(so,t)|. 

As in the previous lemma, let A be a grid of meshwidth {e A 1)/B'^ covering [— P,2P]'^. 
Arguing as in the previous lemma, we have, for any (s,t) with ||(s,t)|| < B and Yli^i — 
e, there exists some So,to with So,So + to E A such that Iljto,* < S'^e and |G„(s,t)| < 
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2'^sup,<,JG„(so,t)|. Thus, 



sup t)| < 2'^ max sup |G„(so, t) | 

,t)\\<B,Yl,ti<£ so,so+to6A,nito,»<2'*£ t<to 



TV 

max sup - E)YijI{h„so < Xi - Xm < K{.Sq + t))\. 

Q+to£A,Y\^to,i<2<i£ t<to hn' 



This gives 



P I sup 

Msm<B,R,U<2'ie 



< \A\ max 

so,so+to&A,Ylito,i<2'^e 



{s,t)\>w 

P (2'' sup ^liEr, - E)Y,^jI{hnSo < Xi - < hn{So + t))\ > w] . 
V t<to hn / 



We have, for some universal constant K and all n with e > n~^^^^~^'^\l + logn)^, letting 
J^n = {{x,y) (-> yjI{hnSo < x — Xm. < hnisn + t)) \ t < tn l and defining || ■ \\p^ip^ to be the 
Orlicz norm defined on p. 90 of Ivan der Vaart and Wellnerl (119961 ) for ^i(x) = exp(x) — 1, 



12" sup \V^{En-E)f{X,,Yi)\\\p^^, 
feTn 



< K 

< K 

< K 

< K 



E sup |v^(E„ - E)f{X,,Yi)\+n-^/\l+\ogn)\\\Y,^,\I{Ks^ < X^ - x^ < h^i-So + to))h 

J(l, L^) {E[\Y,,\I{Ks, <X,-Xm< K{so + to))]'}'^' + n-'/\l + logn) ||r ||p,^, 
J(l, J-., L^)//¥/.:^/22'^/V/2 + + iog^)||y^ 



J{l,Tn,L')f'Y2'/' + \\n 



The first inequality follows b y Theorem 2.14.5 in Ivan der Vaart and Wellnerl (119961 ). The 



second uses Theorem 2.14.1 in Ivan der Vaart and Wellnerl (119961 ). The fourth inequality uses 
the fact that hi^^e^/^ = n-'^/[2('^+4)]£i/2 > ^-i/2(i + i^g^) ^^^^ ^1/2 > ^-i/2+d/[2(d+4)] ^ 

logn) = n~'^^^'^^^\l + logn). Since each is contained in the larger class J-' defined in the 
same way but replacing sq with s, and allowing (s, t) to vary over all of M^*^, we can replace 
J^n by J-" on the last line of this display. Since J(l, J-", L^) and are finite (J-" is a VC 

class and Yij is bounded), the bound is equal to C^^e^^'^hn'^ for a constant C that depends 
only on the distribution of (Xj, Yi). 
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This bound along with Lemma 8.1 in iKosorokl (120081 ) implies 



P ( 2^up - E)YijI{Kso <X,-x^< K{so + t))\> 

V t<to hn 

= P (2" sup |v^(E„ - E)f{X,,Yi)\ > whi/A 



w 



Whn 



< 2 exp 



2^sup^g^„ \V^{E„-E)f{Xi,Yi)\\\p,^, 



The result follows using this and the fact that \A\ < {3B[B'^/{e A 1)] + 2^. □ 

The following theorem verifies the part of condition (ii) of Lemma |2] concerning the 
limiting process Gp,x^,j{s,t) + gp,xm,jis,t). 

Theorem 16. Fix m and j with j G J{m). For any r < 0, e > there exists an M such 
that 

\ll(s,t)||>Af y 

Proof. Let G{s,t) = Gp,x,^j{s,t) and g{s,t) = gp,x,„,j{s,t). Let Sk = {k < ||(s,t)|| < A; + 1} 
and let Sj^ = SkH {Yltti < + l)"*^} for some fixed 6. By Lemma |3l 



P inf G(s,t) + g{s,t) <r] < P \ sup|G(s,t)| > Irl 
< 2 {3(A; + 1)[(A; + 1)7A;-^] + 2}'%xp {-Cr^{k + if) 



for k large enough where C depends only on d. This bound is summable over k. 

For any a and (3 with a < (3, let S^'^ = 5fc n {(fc + 1)" < U- ti<{k + 1)^}. We have, for 
some Ci > that depends only on (i and l^(a;m), (7(3, t) > C*! || (s, t) p tj. (To see this, note 
that g{s,t) is greater than or equal to a constant times J^^^*^^ ■ ■ ■ J/^'*^*'* H^^lpcixd- ■ ■ dxi = 
(llf^itij Yli=ii^'i + + Siti), and the sum can be bounded below by a constant times 
||(s,)f:)|p by minimizing over Si for fixed ti using calculus. The claimed expression for the 
integral follows from evaluating the inner integral to get an expression involving the integral 
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for d — 1, and then using induction.) Using this and Lemma [3l 



P inf G{s,t) + g{s,t) <r] <P \ sup|G(s,t)| > CiA;2+" 

„ , / 1.4+20 

< 2 {3(fc + 1)[(A: + ir/{{k + If A 1)] + 2}''exp [-CClj^^ 



This is summable over kif4 + 2a — (3>0. 

Now, note that, since Yli^i ^ + 1)*^ on Sk, we have, for any — 5 < ai < «2 < • • • < 
< = d, Sk = St U 5^"^'"^ U 5^^'"^ U . . . U 5^^"^'"^ If we choose 6 < 3/2 and 
ai = i for i & {1, d} , the arguments above will show that the probability of the infimum 
being less than or equal to r over S^, S*^"^'"^ and each S'^* ^ is summable over k, so that 
P (infsj, G{s,t) + g{s,t) < r) is summable over /c, so setting M so that the tail of this sum 
past M is less than e gives the desired result. □ 

The following theorem verifies condition (ii) of Lemma |2] for the sequence of finite sample 
processes G^^^.^ t) +5'„^3;^j(s, t) with r]/hn > \\{s,t)\\. As explained above, the case where 
v/hn < II (-5,^)11 is handled by a separate argument. 

Theorem 17. Fix m and j with j G J{m). There exists an t] > such that for any r < 0, 
e > 0, there exists an M and N such that, for all n> N, 



\M<\\{s,t)\\<ri/h„ / 

Proof. Let G„(s,t) = G„_^^j(s,t) and gn{s,t) = gf^^^,.^ t). Let rj be small enough that 
the assumptions hold for ||x — Xm\\ < 2?7 and that, for some constant C2, E{Yij\Xi = x) > 
C2\\x — XmW^ for ||a; — Xm|| < 2ri. This implies that, for ||(s,t)|| < h^^rj, 

9n{s,t)>—^j \\X - Xmffx{x)dx 

<hn{s+t) 

(J^f f f 

> — / \\x - XmW^ dx = I ||a;f > C3||(s,t)||^ JJti 

S<X — Xm<h„{s+t) J S<X<S+t ■ 

where C3 is a constant that depends only on / and d and the last inequality follows from 
bounding the integral as explained in the proof of the previous theorem. 

As in the proof of the previous theorem, let Sk = {k < ||(s,t)|| < A; + 1} and let 
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Sj^ = SkC] {Yli ti < {k + 1) ^} for some fixed 5. We have, using Lemma HI 

p(mfGnis,t) + gn{s,t)<r] < P ( sup t)| > |r| 

< 2 {3{k + l)[{k + ir/k-'] + 2}^%xp (-C^^J^ 

for {k + 1)-^ > n-^/('^+^)(l + \ogny ^ k + 1 < (i + logn)-^/^ so, if 5 < 4, this 

will hold eventually for all (A; + 1) < h^^r] (once h~^r] < n'^/[<^(''+4)] (1 + logn)"^/*^ -^^^ r] < 
^(4/5-i)/(a!+4) j^]^ _l_ logn)"^/''). The bound is summable over k for any 5 > 0. 

Again following the proof of the previous theorem, for a < /3, define S"^'^ = S'^ fl {(A; + 
1)° < Hi '^j — + 1)^}- We have, again using Lemma HI 



P inf G„(s,t) + (/„(s,t) <r < P sup t)| > C3A; 



2+a 



.5 



a.f) 
k 



< 2 {3(fc + l)[{k + lYl{k- A 1)] + 2}^%xp 

for {k + 1)^ > (which will hold once the same inequality holds for 5 for — 5 < /3) 

and A; + 1 < h^^rj. The bound is summable over k for any a, /3 with 4 + 2a — /3 > 0. 

Thus, noting as in the previous theorem that, for any —5 < ai < a2 < ■ ■ ■ < ol^-i < = 
d, Sk = Sl;U S-^'"' U S'^''''' U . . . U S'^'-""\ if we choose 5 < 3/2 and = z for « G {1, . . . , d} 
the probability of the infimum being less than or equal to r over the sets indexed by k 
for any k < h'^rj is bounded uniformly in n by a sequence that is summable over k (once 
1] < 'ri('^/''~i)/('^+^) (1 + logn)"^/''). Thus, if we choose M such that the tail of this sum past 
M is less than e and let be large enough so that r] < A^W''~i)/('^+'^) (1 + log A^)"^/'', we will 
have the desired result. 



□ 



To complete the proof of Theorem [T], we need to show that 

Z„,2 = fn('^+2)/(d+4) E„F,,i/(s <X,<s + t), 

\ \\{s—Xm,t)\\>'ri all m s.t. 1 G J{m) 

E„F,,fcJ(s <Xi<s + t)] ^0. 

\\{s—x,n,t)\\>V 3-11 ™ S.t. fc £ J{m) J 



This follows from the next two lemmas. 
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Lemma 5. Under Assumptions IJ\ and\^ for any rj > 0, there exists some B_> such that 
EYijI{s < Xi < s + 1) > BP{s < Xi < s + t) for all (s, t) with \\ {s - Xm, t)\\ > V for all m 
with j G J{m). 

Proof. Given 77 > 0, we can make 77 smaller without weakening the result, so let 77 be 
small enough that \\xm — 2;r||oo > 2// for all m 7^ r with j e J{m) fl J(r) and fx satisfies 
< / < fx{x) < / < 00 for some / and / on - x^lU < v}- If II - Xrr„t)\\ > r], 

then — Xm, s + t — Xm)\\oo > v/i^d), so it suffices to show that EYijI{s < Xi < s + t) > 
B_P{s < Xi < s + t) for all (s, t) with || (s — Xm, s + t — Xm) ||oo > (4(i). Let yU > be such 
that E{Yij\Xi = x) > fi when ||x — Xm||oo > v/i^d) for m with j G J{m). For notational 
convenience, let 5 = ri/{4d). 

For m with j G J{m), let B{xm,S) = {x\\\x — a;m||oo < and B{xm,S/2) = {a;|||x — 
a^mlloo < ^/2}- First, I show that, for any (s, t) with || {s — Xm, s+t — Xm) \\oo > P{{s < Xi < 
s + t}n Bixrn,S)\Bix^,5/2)) > (l/3)(//7)P({s <X,<s + t}n B{xm,5/2)). Intuitively, 
this holds because, taking any box with a corner outside of B{xm, S), this box has to intersect 
with a substantial proportion of B{xm,S)\B{xm,S/2) in order to intersect with B{xm,S/2). 

Formally, we have {s < x < s + t}nB{xm, S) = {sV {xm — S) < x < (s + t) A (x^ + 5)}, so 
that, letting A be the Lebesgue measure on M.^, A({s < a; < s + t}nB{xm,S)) = Ylili^i'^'^i) ^ 
{xm,i + S) - Si\/ {x,rn,i - S)]- Similarly, A({s < a; < s + t}nB{x,m,S/2)) = lli[{si + ti) A{xm,i + 
5/2)-SiW{xm,i-S/2)]. ForalH, [{si + ti) A{xm,i + S /2)- SiW {x^^i-5/2)] < [(si + ti) A{x^,i + 
6) — Si\/ {xm,i — S)]. For some r, we must have Sj. < Xm,r — S or Sr + tr > Xm,r + S- For this r, we 

will have [{Sr+tr) A{Xm,r + S/2) — Sr\/ {Xm,r — S/2)] < 2 [(s^ + tr) A (x^^^ + f^) — V (x^.r " f^)] /3. 

Thus, A({s <x < s + t} nB{xm,S/2)) < 2A({s <x < s + 1} n B{xm,S))/3. It then follows 
that A({s < X < s + t}n B{xm,S)\B{xm,S/2)) > (l/3)A({s < x < s + 1} n B{xm,S)), so 
that P{{s < X < s + t}nB{xm,5)\B{xm,5/2)) > (l/3)(//7)P({s <x< s + t} n B{xm,5)). 

Now, we use the fact that E{Yij\Xi) is bounded away from zero outside of B{xm,S/2), 
and that the proportion of {s < x < s + t} that intersects with B{xm,S/2) can't be too 
large. We have, for any with — Xm, s + t — Xm)||oo > 

EY,^jI{s <Xi<s + t)> fiP{{s <X,<s + t}\[U„5(x™, 6/2)]) 

= fiP{{s <Xi<s + t}\[UmB{xm, 6)]) + 5^ P({s < Xi < s + t} f] B{xm, 6)\B{x^, 6/2)) 

m 

> fiP{{s <X,<s + t}\[U^P(x^, 6)]) + ^(l/3)(//7)P({s <X,<s + t}n B{xm, 6)) 

m 

>fi{l/3){l/J)P{s<X,<s + t) 
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where the unions are taken over m such that j G J{m). The equahty in the second hne 
follows because the sets B{xm, S) are disjoint. 

□ 

Lemma 6. Let S be any set in R^'* such that, for some ^ > and all {s,t) G S,EYijI{s < 
Xi < s + 1) > fiP{s < Xi < s + t) . Then, under Assumption\^ for any sequence a„ — )■ oo 
and r < 0, 



inf 



n 



(s,t)£S an logn 
with probability approaching 1. 
Proof. For (s, t) G S, 



EnYijI{s <Xi< s + t) > r 



n 



an logn 



EnY,jI{s <X,<s + t)<r 



n 



n 



an log n 



n 



{En - E)Yi^jI{s <Xi<s + t)<r EYi^jI{s <Xi<s + t) 



an log n 



<r /iP(s < Xi < s + 1) < - <^ |r| V 



an log n 



n 



an logn 



/iP(s < Xi < s + t) 



an log n 

n 



1 1/2 



^^^^^ y p{s <Xi<s + t) 



> 



an log n 



an log n 



y P{S < Xi < S + t) 



\{En-E)Yi,,I{s<X,<s + t)\ 

V \jiP{s < < s + t)] I 







On logn 


ri 


r 




n 



If ""'"^^ > P(s < Xi < s + t), then the last line is greater than or equal to °"^°^" |r|. If 

r an log " "I 1/2 

< P(s < Xi < s + t) , the last line is greater than or equal to p(^g^x <s+t) jJ^Pi^s < 
X,<s + t) = {^^) ^i_^P{s <X,<s + t)> Thus, 



P inf 



n 



I IL — L.l 

(s,t)&s an logn 



< P I sup 

(s,t)G5 



EnYijI{s <Xi< S + t) <r 

1/2 



a„ log n 



V P(s < Xi < s + t) 



|(K - P)K,,,/(s < Xi < s + t)| > (|r| A /i) 



a„ log 



— n 



This converges to zero by Theorem 37 in iPoUardl ( 1l984l ) with, in the notation of that theorem 
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J^n the class of functions of the form 



1/2 



Yi^jl{s <X^<s + t) 



with G S", (5„ = ( ^ ^"g^ j and = 1. To verify the conditions of the lemma, the 

covering number bound holds because each is contained in the larger class T of functions 
of the form wYi jI{s < Xi < s + t) where {s, t) ranges over S and w ranges over M, and 
this larger class is a VC subgraph class. The supremum bound on functions in J'n holds by 
Assumption |2l To verify the bound on the norm of functions in J^n, note that 



E 



< 



an log n 



r'^^^ vp(s<x, <s + t) 



1/2 



Yijl{s <Xi< s + t) 



a„ log n 



a„, log n 



V P{S < Xi < S + t) 



P{S <Xi< S + T) < 



a„ log n 2 
= o„ 



n 



since ab/ {aV b) < a for any a,b > 0. 



□ 

By Lemma[5], — a;m,t)|| > rj all m s.t. j G J(m)} satisfies the conditions of Lemma 
El so EnYijI{s < Xi < s + t) converges to zero at a n/ {an logn) rate for any a„ — )■ oo, which 
can be made faster than the n^'^"'"^-'/*^'^"^^^ rate needed to show that Z„ 2 ^ 0. This completes 
the proof of Theorem [H 



Inference 

I use the following lemma in the proof of Theorem [2] 

Lemma 7. Let M. be a Gaussian random process with sample paths that are almost surely 
in the set C(T, M.^) of continuous functions with respect to some semimetric on the index set 
T with a countable dense subset Tq. Then, for any set A gM.^ with Lebesgue measure zero, 
P(inft6TH(t) eA)< P(inft6T,det^ar(H(t))<eH(t) G A for all e > 0). 

Proof. First, note that, if the infimum over T is in A, then, since {t G T| detvar{M.{t)) > e} 
and {t G T| detvar{M.{t)) < ej partition T, the infimum over one of these sets must be in 
A. By Proposition 3.2 in iPitt and Traru ( 1l979l ). the infimum of H[(t) over the former set 
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has a distribution that is continuous with respect to the Lebesgue measure, so the proba- 
bihty of the infimum of EI(t) over this set being in A is zero. Thus, P (inf^gT M.{t) E A) < 
P (i'nfteT,detvar(m{t))<e ^{t) E a) . Taking e to zero along a countable sequence gives the result. 

□ 

Proof of TheoremlM For m from 1 to i, let {jm,i, ■ ■ ■ ,jm,\j{m)\} = J{jn). Then, letting 

Z =(inf Gp,^ij (s, t) + ^P,xi,ji.i(s, t), . . . , inf Gp,^.i , .n,, (s, t) + gp,x^,h i/(i)i • • • > 
inf Gp,^,j,_^ (s, t) + ^P,x,j,,i(s, t), . . . , inf Gp,^,,^-^ i^^^^, (s, t) + c/p,x.,j,,| ^f,,, (s, t)), 

each element of Z is the minimum of the elements of some subvector of Z, where the subvec- 
tors corresponding to different elements of Z do not overlap. Thus, it suffices to show that 
Z has an absolutely continuous distribution. For this, it suffices to show that, for each m, 

(inf Gp,^„ j„ ^ (s, t) + gp,:rmj^,^ {s,t),..., inf Gp,^„ (s, t) + gp,xm,j^.ij(^)i (s, t)) 

has an absolutely continuous distribution, since these are independent across m. 

To this end, fix m and let M{s, t) be the random process with sample paths in C(M^^, IRI-^^™-)!) 
defined by 

M(s,t) = (Gp,^„j„^,(s,t) +^p^^j^^,(s,t),...,Gp,^„j^_i^(,^)i(s,t) +^P,^„j^^i^ 

By Assumption H] far (EI(s, t)) = M]^. for some positive definite matrix M, so that 
detmr(H(s,t)) = (det M) (Hi ^i)''^^'"^'- Thus, inf(^_t)gK2<i,detmr(H(^,t))<£ t) G A for all £ > 
iff. inf (s^()giR2d EI(s, t) G A for all e > so, by Lemma [71 P(inf(3 j-)g]u2d ]HI(s, t) E A) < 

P(inf(5()gig2d_]-[. t.<eE[(s,t) G A for all £ > 0). For each j, Yli^i is equal to var{Mj{s,t)) = 
Pj{0,{s,t)) times some constant, where pj is the covariance semimetric for component j 
given by pj{{s,t), {s',t')) = var(M.j{s,t) — E[j(s',t')). Thus, there exists a constant C such 
that Yli^i — ^ implies pj{0,{s,t)) < Ce for all j, so that P(inf(5 i)g]g2d E[(s, t) E A) < 
^(inf(.,t)6R2d,p^,(o,(s,t))<c£aiijH(s,t) G A for all e > 0). 

Since the sample paths of H are almost surely continuous with respect to the semimetric 
maxjpj((s,t), (s',t')) ontheset < M for any finite M, inf ||(s,t)||<M,p,(o,(s,t))<Ce aii j EI(s, t) G 

A for all e > implies that EI(0) = is a limit point of A on this probability one set. Thus, 
for any set A that does not have zero as a limit point, P{m.i\\f^s^t)\\<M t) E A) = Q for any 
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finite M. Applying this to A\Bn{0) where -8,^(0) is the ?7-ball around in RI'^'^™')!, we have 



P inf m(s, t)eA]=P{ inf H(s, t) e An BJO) ]+P{ inf H(s, t) G A\BJQ) 
<P( inf H(s,t) G An5J0) ) +P f inf H(s, t) G A\S„(0) ) 



+ P\ inf e(s,t) G A\5„(0) 



= P inf ^ G A n P,(0) + P ( inf „ H(s,t) G A\P,(0) 

\(s,t)&?<i J \\\{s,t)\\>M 

Noting that P (inf j)||>7v/ Hl-^, £ ^\-S»;(0)) can be made arbitrarily small by making M 
large, this shows that P (inf(^^()gig2d H(s, t) G A) = P (inf(5^()gig2d E[(s, t) G A fl P^(0)) Tak- 
ing ?7 to zero along a countable sequence, this shows that P (inf(s,t)gR2d H(s, t) G A) < 
P (inf(-5 i)g]K2d EI(s, t) G An {0}) so that inf(^^()g]g2d E[(s, t) has an absolutely continuous dis- 
tribution with a possible atom at zero. 

To show that there can be no atom at zero, we argue as follows. Fix j G J{m). The 
component of H corresponding to this j is Gp^xmji^^ 'l^)~^9P,x,„,j{s, t). For some constant K, for 
any A; > 0, letting Si^k = {i/k,0, . . . , 0) and tk = (l/Zc, 1, . . . , 1), we will have gp,x^j{sLk, tk) < 
K/k for i < k, so that 

P (^^^ inf^2^ '^P^xmjis, t) + ^P,^„j(s, t) = 0^ = P (^^^ inf^^^ Gp,^„j(s, t) + ^p,^„j(s, t) > 0^ 

< P (Gp,x.^j(si,fc, 4) + gp,x,„,jisi^k, tk) > alH G {0, . . . , k}) 

< P (Gp,,.^,,(s,,fc, tk) + K/A; > alH G {0, . . . , A;}) 

= P (v^Gp,,„j(s,,fc, tfc) + ir/v^ > alH G {0, . . . , fc}) 
= (Gp,x„,,(s*,i, h) + ir/v^ > alH G {0, . . . , fc}) . 

The final line is the probability of + 1 iid normal random variables each being greater than 
or equal to —K/ y/k, which can be made arbitrarily small by making k large. □ 



proof of Th eorem\^ This follows immediate. 
bution (see 



y from the continuity of the asymptotic distri- 
Politis. R,omano. and Wolj . ll999h . □ 



proof of Theorem^ It suffices to show that, for every subsequence, there exists a further 
subsequence along which the distribution of Z converges weakly to the distribution of Z. 
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Given a subsequence, let the further subsequence be such that the convergence in probabihty 
in Assumption |6] is with probabihty one. 

For any fixed -B > 0, the processes 



are, along this subsequence, Gaussian processes with mean functions and covariance kernels 
converging with probability one to those of the distribution being estimated uniformly in 
||(s,t)|| < B. Thus, with probability one, the distributions of these processes converge 
weakly to the distribution of the process being estimated along this subsequence taken as 
random processes on ||(s,t)|| < B. Thus, to get the weak convergence of the elementwise 
infimum, we just need to verify part (ii) of Lemma |2l To this end, note that, along the 
further subsequence, the infimum of 

lp,x,,j{s,t) + gp,^^j{s,t) I{\\{s,t)\\ < Bn) 

is eventually bounded from below (in the stochastic dominance sense) by the infimum of a 
process defined the same way as 

Gp,x,j(s,t) + c/p,x,j(s,t), 

but with E(mj(,,)(W„ e)mj(k){Wi, e)'\X = Xk) replaced by 2E{mj(k){Wi, e)mji^k){Wi, ey\X = 
Xk), and V{xk) replaced by V{xk)/2. Once n is large enough that this holds along this further 
subsequence, part (ii) of Lemma [2] will hold by Lemma [T6] applied to this process. □ 

proof of Corollaryl^ By TheoremHl the distribution of S{Z) converges weakly conditionally 
in probability to the distribution of S{Z), and by Theorem [H n('^^+2)/fe+4)^(7^^(^)) ^ 
S{Z). S{Z) has a continuous distribution by Theorem [21 so the result follows by standard 
arguments. □ 



Other Shapes of the Conditional Mean 

This section contains the proofs of the results in Section [5l which extend the results of Section 
[3] to other shapes of the conditional mean. First, I show how Assumption [1] implies Assump- 
tion [7] with 7 = 2. Next, I prove Theorem [5], which gives an interpretation of Assumption 
E] in terms of conditions on the number of bounded derivatives in the one dimensional case. 
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Finally, I prove Theorem [6l which derives the asymptotic distribution of the KS statistic 
under these assumptions. The proof is mostly the same as the proof of Theorem [H and I 
present only the parts of the proof that differ, referring to the proof of Theorem [1] for the 
parts that do not need to be changed. 

To see that, under part (ii) from Assumption [U Assumption [7] will hold with 7 = 2, note 
that, by a second order Taylor expansion, for some x*{x) between x and Xk, 



mj[t),x) 



TTli 



\Xk) _ {X - Xk)Vj{x*{x)){x - Xk) _l X 



\x 



Xk\ 



2\\x — Xk\ 



2 ||x — Xk 



Xk 



X 



Xk\ 



Thus, letting ipj^kit) = ^tVj{xk)t we have 



sup 

xj; \\<5 



mj{6, x) — mj{9, Xk) 



sup 

||a;— Xfc II <(5 



1 X — Xk 

2 \\x - Xk\\ 



\x 



V,{x*{x)) 



X 



Xk 



Xk 
1 X 



I Ou Ou 1 1 ^11 k 



Xk 



X Xk 



This goes to zero as 5 — )■ by the continuity of the second derivative matrix. 

The proof of Theorem |5] below shows that, in the one dimensional case. Assumption [1] 
follows more generally from conditions on higher order derivatives. 

proof of TheoremlB It suffices to consider the case where = 1. First, suppose that Xq has 
infinitely many elements. Let {x^j^x tie a nonrepeating sequence of elements in Xq. Since 
Xq is compact, this sequence must have a subsequence that converges to some x E Xq. If 
Th{6,x) had a nonzero rth derivative at x for some r < p, then, by Lemma IH] below, Th{6,x) 
would be strictly greater than rh{6, x) for x in some neighborhood of x, a contradiction. 
Thus, a pth order taylor expansion gives, using the notation Dr{x) = 6^' /6x^m{6,x) for 
r < p, m{6, x) — m{6, x) = Dp{x*{x)){x — cty/pl < D\x — x^ lp\ where Z) is a bound on the 
pth derivative and x*[x) is some value between x and x. 

If Xq has finitely many elements, then, for each a;o € Afg, a pth order Taylor expansion 

gives Th{9,x) - m(^,xo) = Di{xo){x - xq) + lD2{xo){x - xqY H h ^Dp{x*{x)){x - xqY. 

If, for some r < p, Dr{xQ) ^ and Dr'{xo) = for r' < r, then Assumption [7| will hold at xq 
with 7 = r. If not, we will have rh{6,x) — rh{9,xo) < D\x — xo\^/p\ for all x. □ 

Lemma 8. Suppose that : [x, x] C R — > M minimized at some Xq- If the least nonzero 
derivative of g is continuous at Xq, then, for some e > 0, g{x) > g{xo) for \x — Xo\ < e, 

X ^ Xq. 
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Proof. Let p be the least integer such that the pth derivative g^P\xo) is nonzero. By a pth 
order Taylor expansion, g{x) — g{xo) = g^P''{x*{x)){x — xqJ^ for some x*{x) between x and 
xq. By continuity of g^^\x), \g^^\x*{x)) — g^^\xo)\ > \g^^\xo)\/2 for x close enough to xq, 
so that g{x) — g{xo) = g^'''\x*{x)){x — xqY > \g^^\xo)\/2\x — xo\^ > (the pth derivative 
must have the same sign as a; — xq if p is odd in order for g to be minimized at xq). □ 

I now prove Theorem[6l I prove the theorem under the assumption that 7(j, /c) = 7 for all 
(j, k) with j G J{k). The general case follows from applying the argument to neighborhoods 
of each Xk, and getting faster rates of convergence for (j, k) such that 7(7, k) < 7. The proof 
is the same as the proof of Theorem [1] with the following modifications. 

First, Theorem [15] must be modified to the following theorem, with the new definition of 

C/P,Xfe,i(s,t). 

Theorem 18. Let hn = for some < (3 < 1/dx ■ Let 



'^n,xm{s,t) = -^{En - E)Yi^J(m)I{hnS < Xi - Xm < K{s + t)) 
hn 



and let gn,xmi^y^) have jth element 



9n,xm,j{s,t) = —^—EYijlQlnS < Xi- Xm < hn{s + t)) 
hr 



if j G J(m) and zero otherwise. Then, for any finite M, {Gn,xi{s,t), . . . ,Gn,xe{s,t)) -4 
i'^p,xi{s,t), . . . ,Gp^rce{s,t)) taken as random processes on < M with the supremum 

norm and gn,xrn{s,t) gp,x,„{s,t) umformly in \\{s,t)\\ < M where Gp^^^{s,t) and gp^^^{s,t) 
are defined as in TheoremUl for m from 1 to i. 

Proof. The proof of the first display is the same. For the proof of the claim regarding 

gn,xm{s,t), we have 

\ f ( % — X \ 

9n,x^,j{s,t) = I t/jj^k I 11 _ - x^rfxixm) dx 

hn J hnS<X-Xm<hn{s+t) V 1 1 ^mW/ 

\ r fx — X \ 

+ Vd^ / ^J-fc n _ "11 - XmV[fx{x) - fxM] dx 

S<X — Xm<hn{s+t) \\\X Xni\\J 

/ \m^[B,x) -m^{Q,Xm) 

' hnS<X <h„{s+t) 



I X Xfn . II 



fx{x) dx. 
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The first term is equal to gp,xm,ji^i ^) ^ change of variable x to h^x + Xm in the integral. 
The second term is bounded by gp,x^j{s,t) ?>y^V>\\x-xm\\<2hr,M[fx{.x) - fx{,Xm)]/ fx{.x^), which 
goes to zero uniformly in || (s, t) || < M by continuity of fx- The third term is equal to (using 
the same change of variables) 



s<x<s+t 



mj{9, hnX + Xm) - mj{9, Xm) 

hi 



- tpj,k 



X 



\x\ 



s<x<s+t 



mj{6, hnX + Xm) — mi{6, x. 



Ih^xp 



X 

\x\ 



fx{x) dx 
fx{x) dx. 



For ||(s,t)|| < M, this is bounded by a constant times 

mj{9, hnX + Xm) - mj{9, Xm) 



sup 

||x||<2M 



1pj,k 



X 

R 



which goes to zero as n — i- oo by Assumption [71 



□ 



The drift term and the mean zero term will be of the same order of magnitude if 

^/h't'^ = l/hi^+^ ^K = n-i/('^^+2T), so that 



n 



fe+7)/(d+27)(^^y.^^^^^j^/^^3 < X - a:i < K{S + t)), . . . , EnY,j^,)I{hnS <X-x,< K{s + t)) 
= {Gn,xi{s,t) + gn,xi{s,t), . . . ,Gn,xt{s,t) + 

9n,Xi is,t)) 

(Gp,^.,(s,t) +gp^^^{s,t),...,Gp^^^{s,t) +gp,x^{s,t)) 

taken as stochastic processes in {||(s,)f:)|| < M} with the supremum norm. From now on, let 
= ^-i/(rf+27) so that this will hold. 

Lemmas [3] and m hold as stated, except for the condition in Lemma|l]that e > n~^^^'^'^^'> (1+ 
logn)^ must be replaced by e > n'^^/^'^+'^''\l + logn)^ so that h'}/'^2'^/'^e'^^'^ > n"^/2(l + logn), 
which implies the fourth inequality in the last display in the proof of this lemma, holds for 
the sequence /i„ in the general case. 

The next part of the proof that needs to be modified is the proofs of Theorems [16] and 
[T71 For this, note that, for some constants Ci and rj > 



gp,x^As,t)>C4{s,t)rll^^ 



(2) 



63 



and, for \\{s,t)\\ < rj/K, 

^n,.„,,(s,t)>Ci||(s,t)PJ]t, (3) 

i 

for all m and j. To see this, note that 



fl'n,x™,j(s,t) = E I EYijI{hnS <Xi-Xm< K{s + t)) 
hn 



m{e,x)fx{x) dx = I m{e hnX + Xm) f^(j^^^ _^ 3,^^ f^j. 



I V'/J^V/ / Il/)'rll7 

iln J h„s<x-Xm<hn{s+t) J s<x<s+t 11'%-^ II 

where the last equality follows from the change of variables x to hnX + Xm- For small 
enough rj, this is greater than or equal to ^f_^^^^^^_i_4'\\xpfx{xm)dx for ||(s,t)|| < rj/hn 
by Assumption [7] and the continuity of fx- By definition, gp,xmd{s^t) is also greater than 
or equal to a constant times that this is greater than or equal 

to a constant times ||(s,t)||''' Hj^*) note that the Euclidean norm is equivalent to the norm 
{s, t) f— )■ maxj max{|sj|, and let i* be an index such that \si* \ = maxj max{|si|, 

or \si* +ti*\ = maxj max{|sj|, |sj + tj|}. In the former case, we will have ||x|| > |sj.|/2 for x on 
the set {sj. < Xi* < Sj* + |sj. |/2} fl {s < x < s + t}, which has Lebesgue measure (j^i^i*ti^ ■ 

\si*\/2 > (jli^i*ti^ -U/i, so that J^^^^^^^Wxp dx > (max^ max{|si|, |si + ti|}/2)'^ Hi 
and a symmetric argument holds in the latter case. 

With these inequalities in hand, the modified proofs of Theorems [16] and [T7] are as follows. 

proof of Theorem\l^ for general case. Let G{s,t) = Gp^x,„,j{s,t) and g{s,t) = 5'p,a;m,j(s, t). 
Let Sk = {k< II (s, t)\\<k + l} and let = Skf] {Hi U < {k + 1)'^} for some fixed 5. By 
Lemma [3l 

P (mfG{s,t) + g{s,t) < r] < P |sup|G(s,t)| > |r 
V5,^ J \ St 



2d 



< {?,{k + l)[{k + lY/k'^]+2] eicY>{-Cr\k + iy) 

for k large enough where C depends only on d. Thus, the infimum over each S^^ is summable 
over k. 

For any /3 and ~P with /3 < ^, let ^ = 5^ n {(A; + 1)^ < Hi ti<{k + if}. Using Lemma 
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[3] and 



P inf G(s,t)+^(s,t) <r <P \ sup|G(s,t)| > 



< {3(fc + 1)[(A: + + 1)/^ A 1)] + 2} exp [-CCl^^^\^ . 

This is summable over fcif27 + 2/3 — /3>0. 

Now, note that, since Hj^j < + 1)°' on S'fc, we have, for any — 5 < /3i < < • • • < 
< = = U y ^ft.fe y , , , y ^f^-l'''^ If we choose < 5 < 7, /3i = 0, 

/32 = 7, and = (2/3j) A for i > 2, the arguments above will show that the probability 
of the infimum being less than or equal to r over S*^, S^^'^^ and each 5'^"^*+i is summable 
over fc, so that P iyi^^Sk ^) + ^) < ''^) is summable over fc, so setting M be such that 
the tail of this sum past M is less than e gives the desired result. □ 

proof of Theorem [77| for the general case. Let Gn{s,t) = Gn,x„,,j{s,t) and gn{s,t) = gn,xm,j{Syt)- 

Let rj be small enough that ([3]) holds. 

As in the proof of the previous theorem, let Sk = {k < ||(s,t)|| < A; + 1} and let 
= Skr\ {Yli U < {k + 1)^"^} for some fixed 5. We have, using Lemma HI 



P infG„(s,t) < r < P sup|G„(s,t)| > |r| 

< {Q{k + l)[{k + If Ik-'] + 2}'%xp 



for (fc + l)--^ > n-27/(rf+27)(l + log^)2 ^ ^ + 1 < ^27/[5(d+27)](l^log^)-2/5 ^ ^ 37, this 

will hold eventually for all (A; + 1) < h-^rj (once h-^rj < n'^'y/m+2'r)](^i _^ logn)~2/5 ^ < 
^27/[5(d+27)]^-i/(d+27)(i + iog^)-2/5 _ ^(27/5-1)7(^+27) + log ^) -2/5) ^ r^j^g ^^^^^ summable 

over for any 6 > 0. 

Again following the proof of the previous theorem, for l3 < P, define S*^'^ = Skr\{{k+l)S- < 
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YliU < (fc + I)''}- We have, again using Lemma HJ 




s,t)\> Cik'<+^ 



{k + l)/3/2 



) 



) 



for {k + lf > n-2T/(^+27)(i + iogn)2 (which will hold once the same inequality holds for S for 
—6 < P) and k + 1 < h~^ri. The bound is summable over k for any /3, /3 with 27 + 2/3-/3 > 0. 
Thus, noting as in the previous theorem that, for any — 5 < /3i < /32 < . . . < /3^_i < f3i = 



d, Sk = U S~^'^' U S^''^^ U...U Sl'-^'^\ if we choose < 5 < 7, /3i = 0, /32 = 7, and 



/3j+i = (2/3j) Ac? for i >2, the arguments above will show that the probability of the infimum 
being less than or equal to r over the sets indexed by k for any k < h~^T] is bounded uniformly 
in n by a sequence that is summable over k (once r] < n^'^'y/^^^y^'^+'^'y^ (^l + logn)"^/*^). Thus, 
if we choose M such that the tail of this sum past M is less than e and let be large enough 
so that 7] < Ar(27/-5-i)/(a!+27)(^;L + logN)-'^/^, we will have the desired resuh. 



Lemmas [5] and [6] hold as stated with the same proofs, so the rest of the proof is the same 
as in the 7 = 2 case. The n/(a„ log ra) rate for Z„^2 is still faster than the n^^'^"'^ ^''•'^'^"'^ rate 
for a„ increasing slowly enough. 

The proof of Theorem [2] for the limiting process is the same as before. The only place 
the drift term is used is in ensuring that the inequality gp,xm,j{^i,ki^k) < K/k holds in the 
last display in the proof of the theorem, which is still the case. 

Testing Rate of Convergence Conditions: Subsampling 

First, I collect results on the rate estimate /3 defined in ([T]). The next lemma bounds /3 
when the statistic may not converge at a polynomial rate. Throughout the following, Sn 
is a statistic on M with cdf Jn{x) and quantile function J~^{t). Ln,b{x\T) and Ln,b{x\T) are 
defined as in the body of the paper, with S(Tn{9)) replaced by Sn- 

Lemma 9. Let Sn be a statistic such that, for some sequence Tn and x > 0, TnJn^it) > x for 
large enough n. Then, if uSn A and 6/n — )■ 0, we will have, for any e > 0, L~\{t + e\T) > 
X — e with probability approaching one. 



□ 
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Proof. It suffices to show Ln f,{x — e\T) < t + e with probabihty approaching one. On the 
event En = {\TbSs\ < e}, which has probabihty approaching one, Ln^b{x — e\r) < Lnfi{x\T). 
We also have E[Ln,b{x\T)] = P{TbSs < x) = Jb^x/u) <thj assumption. Thus, 

P{Ln,b{x - e\T) <t + e)>P (|l„,;,(x|r) < t + e} n K 
> P (|L„,b(x|r) < E[Ln,b{x\T)] + 5} n En) . 

This goes to one by standard arguments. □ 

Lemma 10. Let Pa be the estimator defined in Section \6J\ of any other estimator such that 
Pa = — ^log &1-0 ''(1) ^ • Suppose that, for some > and j3u, Xun^"" < Jn^{t — e) eventually 
and b\^Sn — ?■ 0. Then, for any e > we will have Pa < Pu + ^ with probability approaching 
one. 

Proof. We have 

Pa = ] 7 + Op 1 = + Op 1 < Pu ] 7 + Op 1 ^ 

logoi logOl log&i 

where the inequahty holds with probability approaching one by Lemma [9l □ 

The following lemma shows that the asymptotic distribution of the KS statistic is strictly 
increasing on its support, which is needed for the estimates of the rate of convergence in 



Politis. Romano, and Wolj fll999f ) to enough rate that they can be used 



in the subsampling procedure. 

Lemma 11. Under Assumptions U\, 0, 0. [7] o^ndl^ with part (ii) of Assumption\^ replaced 
by Assumption^ if S is convex, then the the asymptotic distribution S{Z) in Theorem\^ 
satisfies P{S{Z) G (a, 00)) = 1 for some a, and the cdf of S{Z) is strictly increasing on 
(a, 00). 

Proof. First, note that, for any concave functions /i, . . . , /^y, /j : 1^ — R, for some vector 
space Vi, X i-> . . . , fdYi^dy)) is convex, since, for any A G (0, 1), 

S{fi{XXaA + (1 - A)Xfe,i), . . . , fk{XXa,dY + (1 " >')Xb,dY)) 

> S{\fi{Xa,l) + (1 - \)fkiXbA), \fk{Xa4Y) + (1 - fk{Xb,dY)) 

> XS{fi{Xa,l), . . . , fkiXa,dY)) + (1 - A)5'(/i(a;6,i), . . . , fkiXb,dY)) 
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where the first inequahty follows since 5* is decreasing in each argument and by concavity of 
the /fcS, and the second follows by convexity of S. 

S{Z) can be written as, for some random processes Eli(t), . . . , H^y (t) with continuous 
sample paths and T = S'(inffg'ir Eli(t), . . . , inffgir (t)). Since the infimum of a 
real valued function is a concave functional, this is a conve x function of the sample paths of 
(Hi (t) , . . . , {t)). The result follows from Theorem 11.1 in lDavydov. Lifshits. and Smorodina 
(Il998l ) as long as the vector of random processes can be given a topology for which this 
function is lower semi-continuous. In fact, this step can be done away with by noting 
that, for To a countable dense subset of T and the first i elements of this subset, 
^(infigTf IHIi(t), . . . ,inf4gT, IH[dy(t)) A S'(inffgR2d ]HIi(t), . . . , infigB;2d (t)) as £ oo, so, 
letting Ff be the cdf of .Sfinf ^^t^ Hi (t) infj^pTg H^y ft)), applying Proposition 11.3 of 



Davydov. Lifshits. and Smorodina (jl998 ) for each shows that $ ^(Fe(t)) is concave for 
each i, so, by convergence in distribution, this holds for S{Z) as well. □ 



The same result in iDavydov. Lifshits. and Smorodinal (119981 ) could also be used in the 
proof of Theorem [2] to show that the distribution of S{Z) is continuous except possibly at 
the infimum of its support, but an additional argument would be needed to show that, if 
such an atom exists, it would have to be at zero. In the proof of Theorem [2], this is handled 

(I1979I) instead. 



Pitt and Tran 



by using the results of 

We are now ready to prove Theorem [71 

proof of Theorem^ First, suppose that Assumption [T] holds with part (ii) of Assumption 
[1] replaced by Assumption [7] for some 7 < 7 < 7 and Xq nonempty. By Theorem [6l 
j^(rfx+7)/('^x+27)^(^2'„(^)) converges in distribution to a continous distribution. Thus, by 
Lemma [10, Pa {dx + l)/{dx + 27), so /3a > /3 = {dx + l)/{dx + 27) with probabil- 
ity approaching one. On this event, the test uses the subsample estimate of the 1 — Q 
quantile with rate estimate (3 A (3. By Theorem 8.2.1 in Politis. Romano, and Wolf (1999), 
(3 A (3 = {dx + 7)/(rfx + 27) + Op((logn)^^) as long as the asymptotic distribution of 
j^((ix+7)/(<ix+27)^^j^^^'j^ jg increasing on the smallest interval {ko,ki) on which the asymp- 
totic distribution has probability one. This holds by Lemma [TT] By Theorem 8.3.1 in 
Politis. Romano, and Wolfl (119991 ) . the Op((log?2)~^) rate of convergence for the rate esti- 



mate (3 A (3 implies that the probability of rejecting converges to a. 

Next, suppose that Assumption [1] holds with part (ii) of Assumption [T] replaced by As- 
sumption [7] for 7 = 7. The test that compares n^^'^S{Tn{9)) to a positive critical value will 
fail to reject with probability approaching one in this case, so, on an event with probability 



68 



approaching one, the test will reject only if /3a > /3 and the subsampling test with rate /3 A /3 
rejects. Thus, the probability of rejecting is asymptotically no greater than the probability 
of rejecting with the subsampling test with rate (3 A (3, which has asymptotic level a under 
these conditions by the argument above. 

Now, consider the case where, for some xq G Xq and B < oo, rhj{6,x) < B\\x — a^olT 
for some 7 > 7. Let mj{Wi,9) = mj{Wi,6) + {B\\x - xop - Thj{9,x)). Then mj{Wi,9) > 
mj(Wi,9), and rhj(Wi,9) satisfies the assumptions of Theorems E] and [21 so 

^fe+7)/fe+27)5(7;(^)) > n(^^+^)/('^^+27)5(o,...,O,infKm,(Wi,,0)/(s < X, < s + t), 0, . . . , 0) 

s,t 

and the latter quantity converges in distribution to a continuous random variable that is 
positive with probability one. Thus, by Lemma [TUI for any e > 0, < {dx + 'y)/{dx + '2j)+e 
with probability approaching one. For e small enough, this means that f3a < {dx + 7)/ {dx + 
27) with probability approaching one. Thus, the procedure uses an asymptotically level a 
test with probability approaching one. 

The remaining case is where mj{9, x) is bounded from below away from zero. If nijiWi, 9) > 
for all j with probability one, S{Tn{9)) and the estimated 1 — a quantile will both be 
zero, so the probability of rejecting will be zero, so suppose that P{mj{Wi,9) < 0) > 
for some j. Then, for some 77 > 0, we have nS{Tn{9)) > rj with probability approach- 
ing one. From Lemma |9] (applied with t less that 1 — a and = b), it follows that 
L~1{1 - a\b^^'^) = b^^'^-'^L~l{l - a\b) > b^^'^-^ri/2 with probability approaching one. By 
Lemma El S{Tn{9)) will converge at a nlogn rate, so that n'^^^S(Tn{9)) < n^^^~^(\ognY 
with probability approaching one. Thus, we will fail to reject with probability approaching 
one as long as n''^^~-'^(logn)^ < b^^^~^r]/2 = n^3(/3A/3-i)^^2 for large enough n, and this holds 
since Xs < 1- A similar argument holds for L~\{1 — a\bl^^^). 

□ 

Testing Rate of Convergence Conditions: Estimating the Second 
Derivative 

proof of LemmalJl Let h{x) = mj{9, x)— min^./^^ mj{9, x) where rhj{9, x) = E{mj{Wi, 9)\Xi = 
x) for a continuous version of the conditional mean function. First, note that A'J is compact. 
Since each x G A'q is a local minimizer of h{x) such that the second derivative matrix is 
strictly positive definite at x, there is an open set A{x) containing each x E Xq such that 
h{x) > on A{x)\x. The sets A{x) with x ranging over Xq form a covering of Xq with 
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open sets. Thus, there is a finite subcover A{xi), . . . A{x£) of Xq. Since the only elements in 
A{xi) U ■ ■ • U A{xe) that are also in Xq are Xi, . . . , X£, this means that Xq = {xi, . . . , xi}. □ 

proof of Theorem[TM By the next lemma, we will have Ag C Aq C U^^^-^^Bi;^{xj^k) and Aq C 
Xq C Ufc s.t. jGJ{k)Be„{xk) with probability approaching one. When this holds, we will have 
^ < ^ '^(^)}| by construction and, once is less than the smallest distance between any 
two points in X^, we will also have ij = \{k\j G J{k)}\ and, for each k from 1 to ij, we will 
have, for some function r(j, k) such that r(j, ■), is bijective from {1, . . . , ij} to {A;|j G 

G for each j. A;. When this holds, all of the Xj^kS with r(j, /c) equal will be in 

the same equivalence class, since the corresponding e„ neighborhoods will intersect. When 
En is small enough that e„ neighborhoods containing Xr and neighborhoods containing Xg 
do not intersect for r ^ s, there will be exactly i equivalence classes, each one corresponding 
to the (j, k) indices such that r{j, k) is the same. Let the labeling of the XsS be such that, 
for all s, Xs = Xj^k for some (j, k) such that r{j, k) = s. Then, for each s, we have, for some 
(j, fc) such that r{j,k) = s, Xg = Xr(j^k) ^ -Se„(aij,fc) = B^^{xs) with probability approaching 
one so that Xg A Xg. To verify that J{s) = J{s) with probability approaching one, note 
that, for j G J{s), we will have Xg E Xq C UkB^^^{xj^k) and Xg G Be^{xg) eventually, and, 
when this holds, \y\kBeS^3,k)] H B^^^Xg) 7^ so that j G J{s). For j ^ ^(s), each Xj^k will 
eventually be within e„ of some x,. with r ^ s, while indices (j', k') in the equivalence class 
associated with s will eventually have Xji^k' within 2e of Xg, so that (j, /c) will not be in the 
equivalence class associated with s for any k, and j ^ ^(s). □ 

Lemma 12. Suppose that sup^.^^) ||mj(6', x) —frij{6, x)\\ = 0{an) for some sequence an — )■ 0. 
Then, under Assumption [H, for any sequence 6„ — 00 with bnan — ?■ and En with e„ — )■ 
more slowly than \/hnan, the set X^ = {x\mj{9,x) < bnttn} satisfies 

'^O — '^0 — s.t. j€J{k)Be,XXk) 

Proof. We will have Xq C Xq as soon as sup^^^, \\mj{9,x) — mj{9,x)\\ < bnan, which hap- 
pens with probability approaching one. To show that X^ C Uk s.t. jej{k)Bs„{xk) eventu- 
ally, suppose that, for some x G A'q, x ^ B^^{xk) for any k. Let C and rj be such that 
mj{6,x) > C miufc ||x — when ||x — Xjt|| < f] for some k (such a C and exist by As- 
sumption [Tl). Then, for any x such that rhj{9,x) < bnan, we must have, with probability 
approaching one, 

C min ||x — Xfc||^ < mj{9, x) < 6^0^ + mj{9, x) — mj(6', x) < 26„a„ 
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where the first inequahty follows since is contained in — Xk\\ < tj some k s.t. j G 

J{k)} eventually. Since En > ^J^2hJh^JC eventually, the first claim follows. □ 

Local Alternatives 

proof of Theorem Everything is the same as in the proof of Theorem [1], but with the 
following modifications. 

First, in the proof of Theorem [T^ we need to show that, for all j, 

n 

{En - E)[mj{Wi, do + a„) - mj{Wi, 9q)\I{Ks < Xi - Xk < K{s + t)) 



converges to zero uniformly over < M for any fixed M. By Theorem 2.14.1 in 

van der Vaart and Wellnerl (jl996l ). the norm of this is bounded up to a constant by 



J{l,J^n,L2)ja^/EFjJC~W^, where = {{x,w) ^-^ [mj{w,9o + an) - mj{w,9o)]I{hnS < 

"'n 

X — Xk < hn{s + t))\{s,t) G M?^} and Fn{x,w) = \mj{w,6o + an) — rnj{w,6o)\I{—hnML < 
X — Xk < 2hnML) is an envelope function for this class (here t is a vector of ones). The 
covering numbers of the J-'nS are uniformly bounded by a polynomial, so that we just need 
to show that ~[^J EFn{Xi, Wi)"^ converges to zero. We have 



'EFn{x,,w;, 



n 
1 



1 



EE{[mj{Wu 00 + an) - m,{Wi, eo)]^\Xi}I{-hnML <X,-Xk< 2/i„M0 



< -j=^EI{~KMl <Xi-Xk< 2hnMi) sup E{[mj{Wi, Oq + an) - m,{Wi, eo)Y\X, = x} 

yhn \\x-Xk\\<ri 

where the first equality uses the law of iterated expectations and the second holds eventually 
with rj chosen so that the convergence in Assumption [T3] is uniform over ||x — a;fc|| < rj. The 
first term is bounded eventually by / J_mi.<x<2Ml where / is a bound for the density of 
Xi in a neighborhood of Xk (this follows from the same change of variables as in other parts 
of the proof). The second term converges to zero by Assumption | 
Next, in the proof of Theorem [151 we need to show that 



"'n 



j^E[mj{6o + an,Xi) - mj{9o, Xi)]I{hnS < Xi - Xk < Ki-s + t)) fx{xk)mej{dQ,Xk)aW_ti 
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uniformly in ||(s,t)|| < M. We have 

-^E[fhj{eQ + a„,Xi) - mj{9Q,Xi)\I{hnS < X-i - Xk < K{s + t)) - f xixkirfie.jiOo, Xk)a^ti 



1 



/ {[mjiOo + a„,x) - mj{eo,x)]fx{x) - hlfx{xk)m0j{eo,Xk)a} dx 

n J hnS<X~'X^.<hn{s+t) 

{/i^^[mj(6'o + a„, hnX + Xk) - mj{9o, Kx + Xk)\fx{hnX + Xk) - fx{xk)rhgj{6o,Xk)a} dx 

s<x<s+t 

where the second equahty comes from the change of variable x hnX + Xk ■ This will go to 
zero uniformly in ||(s,t)|| < M as long as sup||^||<2M \\fx{hnX + Xk) — fx{,Xk)\\ and 

sup \\h~'^[rnj{6Q + a„, Kx + Xk) - rhj{9o, Kx + Xk)] - rh0j{eQ,Xk)a\\ 

\\x\\<2M 

both go to zero. sup||^||<2A/ \\fx{hnX + Xk) — fx{,Xk)\\ goes to zero by continuity of fx at Xk- 
As for the other expression, since aK^ = an, the mean value theorem shows that this is equal 
to mQ j{9*{an), hnX + Xk)a — Thgj{9Q, Xk)a for some 6'*(a„) between 6*0 and Oq + a„. This goes 
to zero by Assumption [121 

In verifying the conditions of Lemma[2], we need to make sure the bounds, gp,xk,j,a{s, t) > 

9n,Xf^,j,a 

{s,t) = ^Emj{W„9o + an)I{Ks < Xi < hn{s + t)) > C\\{s,t)fY[u 

" i 

still hold for > M for M large enough and, for the latter function, ||(s,t)|| < h~^ri 

for some r] > and n greater than some that does not depend on M. We have 

5'p,xfej>(s,^) = gp,xk,j{s,t) +rh0j{eo,Xk)afxixk)Ylti > C\\{s,t)fYl^i + '^s,jiOo,Xk)afx{xk)Yl^i 

i i i 

= \\{sM'[C + rno,{9o,Xk)afx{xk)/Us,tn']ll^^ 

i 

where the first inequality follows from the bound in the original proof. For ||(s,t)|| > 
M for M large enough, this is greater than or equal to A'H (s, t) |p J^. tj ioi K = C — 
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\me,j{9o,Xk)a\fx{xk)/M'^ > 0. For 5'n,xfc,i,a(s, we have 

\\gp,xk,j,a{s,t) - gp,xk,j{s,t)\\ = \\--^E[mj{Wi,eo + an) -mj{Wi,eo)]I{hnS < Xi < /i„(s + t))|| 
< sup || — [mj(6'o + a„,x) - mj(6'o,a;)]||||T^ 

By the mean value theorem, fhj{6Q + a„,z) — fhj{6o,x) = fhj g{6* {an) , x)an for some 6*{an) 
between 6q and 6'o + a„. By continuity of the derivative as a function of {6,x), for small 
enough i] and n large enough, ThjQ(9*{an),x) is bounded from above, so that ||7^[^^j(6'o + 
an,x)—mj{9Q,x)]\\ is bounded by a constant times ||a„||//;,^ = ||a||. By continuity of /x at a;^, 
\\jjEI{hnS < Xi < hn{s + t))\\ is bounded by some constant times Yli^i II (-55^) II — f^n^V- 
Thus, for M < ||(s,t)|| < h^^r] for the appropriate M and r], we have, for some constant Ci, 

gp,x,j,a{^,t) > gp,x,As,t) - CiY[u > c\\{s,t)fY[ti-CiY[u 

i i i 

= us,t)r[c-c^/us,t)r]ll^^ 

i 

where the second inequality uses the bound from the original proof. For M large enough, 
this gives the desired bound with the constant equal to C — Ci/M > 0. 

In verifying the conditions of Lemma O we also need to make sure the argument in 
Lemma H] still goes through when m(PFj,6'o) is replaced by m{Wi,9Q + an). To get the 
lemma to hold (with the constant C depending only on the distribution of X and the Y in 
Assumption [T4l) . we can use the same proof, but with the classes of functions J^n defined to be 
J^n = {{x, w) mj{w, 9o + an)I{hnSo < x — Xk < hn{sQ + t))\t < to} (-'^(l, ^n, L"^) is bounded 
uniformly for these classes because the covering number of each Tn is bounded by the same 
polynomial), and using the envel ope function Fn(x,w) = Yl j hnSo < x — Xk < hn{so + to)) 



when applying Theorem 2.14.1 in Ivan der Vaart and Wellnerl ( 1l996h . 

□ 

proof of Theorem^I^ First, note that, for any neighborhoods B{xk) of the elements of A'o, 
y/nmfs,tEnmj{Wi,9o + an)I{s < X < s + t) = ^/nm^(^s,s+t)eu^: j^j^Bix^) Enmj{Wi,9o + 
an)I{s < Xi < s + 1) + Op(l) since, if these neighborhoods are made small enough, we will 
have, for any (s, s+t) not in one of these neighborhoods, ErrijiWi, 9o+an)I{s < Xi < s+t) > 
BP[s < Xj < s + 1) by an argument similar to the one in Lemma [5l so that an argument 
similar to the one in LemmaOwill show that inf(s ,,+j)gu^ j^j(k)B(x,,) EnrrijiWi, 6'o + a„)/(s < 
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Xi < s + t) converges to zero at a faster than y/n rate (Assumption [12] guarantees that 
E[mj{Wi, 6q + a„)|X] is eventually bounded away from zero outside of any neighborhood of 
Xq so that a similar argument applies). 

Thus, the result will follow once we show that, for each j and k such that j G J{k), 



With this in mind, fix j and k with j G J{k). 

Let (s*, t*) minimize Enmj{Wi, 6'o + a„)/(s < X < s + t) over B{xkY (and be chosen from 
the set of minimizers in a measurable way). First, I show that p(0, (s*, t*)) A where p is the 
covariance semimetric p((s, t), (s', t')) = f ar(mj(iyi, Oq)I{s < x < s + t) — rrijiWi, 9q)I{s' < 
X < s' + 1')). To show this, note that, for any e > 0, ErrijiWi, Oq + a„)/(s < Xj < s + 1) is 
bounded from below away from zero for p(0, (s, t)) > e for large enough n. To see this, note 
that, for p(0, (s,t)) > Hj^i — ^^r some constant so that ||(s,t)|| > K^/'^ and, for 
some constant C and a bound / for /x on B{xk), 



ErrijiWi, 9o + an)I{s <Xi<s + t) 

= Emj{Wu 9o)Iis <Xi<s + t) + E[mj{9o + a„, Xi) - mj{9o, Xi)]I{s < Xi < s + t) 



in this display will be positive and bounded away from zero for large enough n. Thus, we can 
write y/nEnUijiWi, 6^0 + a„)/(s < Xi < s + t) as the sum of y/n{En — E)mj{Wi, 9q + a„)/(s < 
Xi < s + 1), which is Op{l) uniformly in (s,t), and y/nEmj{Wi,9Q + an)I{s < X < s + t), 
which is bounded from below uniformly in p(0, {s,t)) > e by a sequence of constants that 
go to infinity. Thus, infp(o,(s,i))>e -JnEnrrijiWi, 9q + an)I{s < X < s + t) is greater than zero 
with probability approaching one, so p(0, (s*,t*)) A 0. 



^/n inf Enmj(Wi, 9o + an)I(s < Xi < s + t) 





> Ci||(s,t)f- sup \\mj{9o + an,x) - mj{9o,x)\\f K. 



x€B{xk) 



By Assumption [131 snp^^six^^ 



nij {9q + a„, x) — rrij {9o, x) \\ converges to zero, so the last term 
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Thus, for some sequence of random variables e„ — )■ 0, 



y/n'mi EnTJijCWi, 9q + an)I{s < X < s + t) 



s,t 

n 



"J 

inf EnmAWi,6Q + an)I{s < X < s + t). 

p(ld,(s* ,t*))<e„,(s,s+t)&B{xk) 



This is equal to ^/nmi p^Q^(^s\t-))<e^,{s,s+t)eB{xk) EmjiWi.Oo + an)I{s < X < s + t) plus a 
term that is bounded by \/nsupp(o,(,*,t.))<£„,(,,,+t)gB(^^) |(K - E)Enmj{Wi,eQ + a„)/(s < 
X < s + By Assumption [T3] and an arg ument using the maximal inequality in Theorem 



2.14.1 in Ivan der Vaart and Wellnerl (Il996l ). \/^sup(^ \{En — E)[mj{Wi,6Q + a„) — 

nijiWi, 9o)]I{s < Xi < s + t)\ converges in probability to zero. y/n{En — E)mj(Wi, 9o)I{s < 
Xi < s + t) converges in distribution under the supremum norm to a mean zero Gaus- 
sian process E[(s,t) with covariance kernel cov{M{s,t),Il{s' ,t')) = cov{mj(Wi,9o)I{s < 
Xi < s + t),mj{Wi,9o)I{s' < Xi < s' + t')) and almost sure p continuous sample paths. 
Since {z,e) t-)- supp(o_(s,t))<£ t)| is continuous in C(R^'^^,p) x M (where C(]R^°'-^,p) is 
the space of p continuous functions on R^'^) under the product norm of the supremum 
norm and the Euclidean norm, by the continuous mapping theorem, sup^j-g (•^_4-))<g^ \y/n{En — 
E)mj{Wi,9o)I{s < Xi < s + t)\ sup^(Q_(g_^))<Q EI(s, t) = (the last step follows since 
var{M{s,t)) = whenever p(0, {s,t)) = 0). 
Thus, 

y/n inf Enmj{Wi,9o + an)I{s < Xi < s + t) 

(s,s+t)GB(xfc) 



n inf 

p{0,{s,i))<£„,(s,s+t)eB(xfc) 



Emj{Wi, 6*0 + a„)/(s < Xi < s + t) + Op{l] 



n inf 

p{0,{s,t))<£„,(s,s+t)€B{xi,) Js<:x<s+t 



/ mj{9Q + an,x)fx{x)dx + 0p{l). 

J s<x<s+t 



By Assumption [T2| the integrand is positive eventually for ||(s — a;A;,t)|| > 1] for any rj > 0, 
and once this holds, the infimum will be achieved on \\{s — Xk,t)\\ < rj. Using a first order 
Taylor expansion in the first argument of (6'o + a„, x) and a second order Taylor expansion 
in the second argument the integrand is equal to 



{x - Xk)V{x*{x)){x - Xk)' + mgj{9*{an),x)ar, 



fx{x) 



for some x*{x) between x and Xk and 9* {an) between 9q and 9q + a„. For rj small enough, 
continuity of the derivatives at (6^0, Xk) guarantees that this is bounded from below by — 
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Xfclp — C2a„ for some constants Ci and C2, so the integrand is positive for x greater than 
C-\/||a„|| for some large C, so that the infimum will be taken on \\{s,s -\-t)\\ < C-\/\\an\\- 
Thus, we have 

^A^ inf Enmj{Wi, 9q + an)I{s < Xi< s + t) 

{s,s+t)£B{xk) 



n inf 

p{Q,{s,t))<en,\\{s-Xk,t)\\<C^J\\an\\ Js<x<s+t 

This will be equal up to o(l) to the infimum of 



/ mj{6o + an,x)fx{x) dx + Op{l). 

J s<x<s+t 



s<x<s+t 



-{x - Xk)Vj{xk){x - Xk)' + me,j{9o, Xk)ar, 



fx{xk)dx 



once we show that the difference between this expression and y/n Jg^^^g.f iTijiOo+an, x)fx{x) dx 



goes to zero uniformly over ||(s — < C-\/||an|| (the infimum of this last display will 

be taken at a sequence where — x^, t)|| < C-\/||a„|| anyway, so that the infimum can be 
taken over all of R^*^) . 

The difference between these terms is 



s<x<s+t 



-{x - Xk)Vj{xk){x - Xk)' + m0j{9o,Xk)an [fx{x) - fx{xk)] dx 
+ \/n I Xk)Vj{x*{x)){x - Xk)' - {x- Xk)Vj{xk){x - Xk)'] fx{x) dx 

J s<x<s+t ^ 

+ \/n / [me,j{9*{an),x) - me,j{9o,Xk)]anfx{x)dx. 

J s<x<s+t 

These can all be bounded using the change of variables u = [x — Xk)n^^^'^^^'^'^^^ and the 
continuity of densities, conditional means, and their derivatives. The first term is 



ln^/Wd+2))(^s-Xk)<u<{s+t-Xk)n^/('2('^+'^)) 

X [fx{n-'/^'^'+'K + Xk) - fx{xk)]n-'/^'^'+'^^ du 

"1 

l/(2(d+2)) (s-Xk)<u<{s+t-Xk)n^/(^(^+^)) . 



^uV,{xk)u'n-^/^'^+^^ + mgj{9o, Xk)an-^/^'^+^^ 



I 



^uVj{xk)u' + mej{9o, Xk)a 



The integrand converges to zero uniformly over u in any bounded set by the continuity 
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of fx at Xk, and the area of integration is bounded by \\u\\ < 2'n}/^'^^'^'^'^^^\\{s — Xk,t)\\ < 
2c^i/(2(d+2))^^^-i/(2(d+2)) ^ 2C^/M on II (s - Xk,t)\\ < C^/\\^\. Using the same 
change of variables, the second term is bounded by the integral of 

i Ky,(..(n-./.^<-^..„ + - uV,Mu'] + .0 

over a bounded region, and this converges to zero uniformly in any bounded region by 
continuity of the second derivative matrix. The last term is, by the same change of variables, 
bounded by the integral of 



n 



-l/(2(d+2)) 



U 



+ Xk) - me J (6^0, Xk)] afx{n 



-l/(2(d+2)) 



U + Xk) 



over a bounded region, and this converges to zero by continuity of moj{9,x) at (^O)^;^). 
Thus, 



^/n inf EnmAWi, Oq + a„)/(s < Xi < s + t) 

{s,s+t)eB{xk) 



inf 

{s-k,t)\\<Cy/\\an\\ J s<x<s+t 

inf 

(s— a;fc,t)||<C-v/||a|| J {s-Xk)<u<{s—Xk-\-t) 



^(x - Xk)Vj{xk){x -Xk)' + me J {00, Xjt)a„ 



^uVj{xk)u' + m0j{eo, Xk)ar, 



fx{xk) dx + Op(l) 
fx{xk)du + Op{l) 



where the last equality follows from the same change of variables and a change of coordinates 
in (s, t). The result follows since, for large enough C, the unconstrained infimum is taken on 
||(s — Xfc, t)|| < C^||a||, and C can be chosen arbitrarily large. □ 



References 

Abler, R. J. (1990): "An Introduction to Continuity, Extrema, and Related Topics for 
General Gaussian Processes," Lecture Notes-Monograph Series, 12, i-155. 

Andrews, D. W., S. Berry, and P. Jia (2004): "Confidence regions for parameters 
in discrete games with multiple equilibria, with an application to discount chain store 
location," . 

Andrews, D. W., and P. Guggenberger (2009): "Validity of Subsampling and ?plug- 



77 



in Asymptotic? Inference for Parameters Defined by Moment Inequalities," Econometric 
Theory, 25(03), 669-709. 

Andrews, D. W., and X. Shi (2009): "Inference Based on Conditional Moment Inequal- 
ities," Unpublished Manuscript, Yale University, New Haven, CT. 

Andrews, D. W. K., and P. Jia (2008): "Inference for Parameters Defined by Moment 
Inequalities: A Recommended Moment Selection Procedure," SSRN eLihrary. 

Andrews, D. W. K., and G. Soares (2010): "Inference for Parameters Defined by 
Moment Inequalities Using Generalized Moment Selection," Econometrica, 78(1), 119- 
157. 

Armstrong, T. (2011): "Weighted KS Statistics for Inference on Conditional Moment 
Inequalities," Unpublished Manuscript. 

Beresteanu, a., and F. Molinari (2008): "Asymptotic Properties for a Class of Par- 
tially Identified Models," Econometrica, 76(4), 763-814. 

Bierens, H. J. (1982): "Consistent model specification tests," Journal of Econometrics, 
20(1), 105-134. 

Bugni, F. a. (2010): "Bootstrap Inference in Partially Identified Models Defined by Mo- 
ment Inequalities: Coverage of the Identified Set," Econometrica, 78(2), 735-753. 

Chernozhukov, v., H. Hong, and E. Tamer, (2007): ^-Estimation and Confidence 
Regions for Parameter Sets in Econometric Models," Econometrica, 75(5), 1243-1284. 

Chernozhukov, V., S. Lee, and A. M. Rosen (2009): "Intersection bounds: estimation 
and inference," Arxiv preprint arXiv:0907.3503. 

Chetty, R. (2010): "Bounds on Elasticities with Optimization Frictions: A Synthesis of 
Micro and Macro Evidence on Labor Supply," NBER Working Paper. 

CiLlBERTO, F., and E. Tamer (2009): "Market structure and multiple equihbria in airhne 
markets," Econometrica, 77(6), 17911828. 

Davydov, Y. a., M. a. Lifshits, and N. V. Smorodina (1998): Local Properties of 
Distributions of Stochastic Functional. American Mathematical Society. 



78 



Galichon, a., and M. Henry (2009): "A test of non-identifying restrictions and confi- 
dence regions for partially identified parameters," Journal of Econometrics, 152(2), 186- 
196. 

ICHIMURA, H., AND P. E. Todd (2007): "Chapter 74 Implementing Nonparametric and 
Semiparametric Estimators," vol. Volume 6, Part 2, pp. 5369-5468. Elsevier. 

Imbens, G. W., and C. F. Manski (2004): "Confidence Intervals for Partially Identified 
Parameters," Econometrica, 72(6), 1845-1857. 

Khan, S., and E. Tamer (2009): "Inference on endogenously censored regression models 
using conditional moment inequalities," Journal of Econometrics, 152(2), 104-119. 

Kim, J., AND D. Pollard (1990): "Cube Root Asymptotics," The Annals of Statistics, 
18(1), 191219. 

Kim, K. I. (2008): "Set estimation and inference with models characterized by conditional 
moment inequalities," . 

Koenker, R., and G. Bassett (1978): "Regression Quantiles," Econometrica, 46(1), 
33-50, ArticleType: research-article / Full publication date: Jan., 1978 / Copyright 1978 
The Econometric Society. 

(1982): "Robust Tests for Heteroscedasticity Based on Regression Quantiles," 

Econometrica, 50(1), 43-61, ArticleType: research- article / Full pubhcation date: Jan., 
1982 / Copyright 1982 The Econometric Society. 

KOSOROK, M. R. (2008): Introduction to Empirical Processes and Semiparametric Infer- 
ence. 

Lee, S., K. Song, and Y. Whang (2011): "Testing functional inequalities," . 

Lehmann, E. L., and J. p. Romano (2005): Testing statistical hypotheses. Springer. 

Manski, C. F. (1990): "Nonparametric Bounds on Treatment Effects," The American 
Economic Review, 80(2), 319-323. 

Manski, C. F., and E. Tamer (2002): "Inference on Regressions with Interval Data on a 
Regressor or Outcome," Econometrica, 70(2), 519-546. 



79 



Menzel, K. (2008): "Estimation and Inference with Many Moment Inequalities," Preprint, 
Massachussetts Institute of Technology. 

Moon, H. R., and F. Schorfheide (2009): "Bayesian and Frequentist Inference in Par- 
tially Identified Models," National Bureau of Economic Research Working Paper Series, 
No. 14882. 

Pagan, A., and A. Ullah (1999): Nonparametric econometrics. Cambridge University 
Press. 

Pares, A., J. Porter, K. Ho, and J. Ismi (2006): "Moment Inequalities and Their 
Application," . 

Pitt, L. D., and L. T. Tran (1979): "Local Sample Path Properties of Gaussian Fields," 
The Annals of Probability, 7(3), 477-493. 

POLiTis, D. N., J. P. Romano, and M. Wolf (1999): Subsampling. Springer. 

Pollard, D. (1984): Convergence of stochastic processes. David Pollard. 

Ponomareva, M. (2010): "Inference in Models Defined by Conditional Moment Inequali- 
ties with Continuous Covariates," . 

Romano, J. P., and A. M. Shaikh (2008): "Inference for identifiable parameters in 
partially identified econometric models," Journal of Statistical Planning and Inference, 
138(9), 2786-2807. 

Romano, J. P., and A. M. Shaikh (2010): "Inference for the Identified Set in Partially 
Identified Econometric Models," Econometrica, 78(1), 169-211. 

Stoye, J. (2009): "More on Confidence Intervals for Partially Identified Parameters," 
Econometrica, 77(4), 1299-1315. 

VAN DER Vaart, A. W., AND J. A. Wellner (1996): Weak convergence and empirical 
processes. Springer. 

Wright, J. H. (2003): "Detecting Lack of Identification in Gmm," Econometric Theory, 
19(02), 322-330. 



80 



Figure 1: Case with faster than root-n convergence of KS statistic 
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Figure 2: Cases with root-n convergence of KS statistic and faster rates {/32) 
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Figure 5: Histograms for n^/^S'(T„(0)) for Design 1 (ri^/^ Convergence) 
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Figure 6: Histograms for n^^'^S(Tn{d)) for Design 2 (n^/^ Convergence) 
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Figure 7: Data for Empirical Illustration 
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Figure 8: 95% Confidence Region Using Estimated Rate 
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Figure 10: 95% Confidence Region Using LAD with Points 
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n = 100 n = 500 n = 1000 n = 2000 n = 5000 


nominal 90% coverage 


estimated rate 
conservative rate (n^/^) 
(infeasible) exact rate (n^/^) 


0.873 0.890 0.897 0.889 0.879 
0.991 0.987 0.987 0.995 0.996 
0.921 0.909 0.905 0.903 0.890 


nominal 95% coverage 


estimated rate 
conservative rate (n^/^) 
(infeasible) exact rate (n'^/^) 


0.940 0.943 0.954 0.947 0.934 
0.998 1.000 0.998 1.000 0.999 
0.976 0.965 0.949 0.956 0.953 


Table 1: ( 


Coverage Probabilities for Design 1 
n = 100 n = 500 n = 1000 n = 2000 n = 5000 


nominal 90% coverage 


estimated rate 
conservative rate (n^/^) 
(infeasible) exact rate (n^/^) 


0.780 0.910 0.928 0.925 0.924 
0.949 0.947 0.938 0.932 0.924 
0.949 0.947 0.938 0.932 0.924 


nominal 95% coverage 


estimated rate 

conservative rate (n^^^) 
(infeasible) exact rate (n^^^) 


0.885 0.945 0.966 0.971 0.979 
0.991 0.982 0.975 0.974 0.979 
0.991 0.982 0.975 0.974 0.979 



Table 2: Coverage Probabilities for Design 2 
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n = 100 n = 500 n 


= 1000 


n = 2000 


n = 5000 


nominal 90% coverage 


estimated rate 


0.26 0.13 


0.08 


0.06 


0.03 


conservative rate {n^^'^) 


0.33 0.17 


0.12 


0.09 


0.06 


(iiifeasil)lo) exact rate (/'''''') 


0.21 0.10 


0.07 


0.05 


0.03 


nominal 95% coverage 


estimated rate 


0.35 0.17 


0.11 


0.07 


0.05 


conservative rate (n^^/^) 


0.39 0.22 


0.15 


0.11 


0.07 


(infeasible) exact rate (n^^^) 


0.29 0.13 


0.09 


0.06 


0.04 



Table 3: Mean of ui-a — ^i,di for Design 1 





n = 100 n = 500 n 


= 1000 


n = 2000 


n = 5000 


nominal 90% coverage 


estimated rate 


0.11 0.08 


0.06 


0.04 


0.02 


conservative rate (n^/^) 


0.20 0.09 


0.06 


0.04 


0.02 


(infeasible) exact rate (n^/^) 


0.20 0.09 


0.06 


0.04 


0.02 


nominal 95% coverage 


estimated rate 


0.18 0.10 


0.07 


0.05 


0.03 


conservative rate {11} ^'^) 


0.27 0.11 


0.08 


0.05 


0.03 


(infeasible) exact rate (n^/^) 


0.27 0.11 


0.08 


0.05 


0.03 



Table 4: Mean of ui-a — ^2,02 for Design 2 





01 


02 


Estimated Rate 


[-48, 84] 


[0.0113,0.0342] 


Conservative Rate 


[-60, 138] 


[0.0030,0.0372] 


LAD with Points 


[-63,63] 


[0.0100, 0.0244] 



Table 5: 95% Confidence Intervals for Components of 
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